Workflow
Reinforcement Learning
icon
Search documents
MuJoCo具身智能实战:从零基础到强化学习与Sim2Real
具身智能之心· 2025-06-24 14:29
Core Insights - The article discusses the unprecedented turning point in AI development, highlighting the rise of embodied intelligence, which allows machines to understand language, navigate complex environments, and make intelligent decisions [1][2]. Group 1: Embodied Intelligence - Embodied intelligence is defined as AI systems that not only possess a "brain" but also have a "body" capable of perceiving and interacting with the physical world [1]. - Major tech companies like Tesla, Boston Dynamics, OpenAI, and Google are competing in this transformative field, which is expected to revolutionize various industries including manufacturing, healthcare, and space exploration [1]. Group 2: Technical Challenges - Achieving true embodied intelligence faces significant technical challenges, requiring advanced algorithms and a deep understanding of physical simulation, robot control, and perception fusion [2][4]. - MuJoCo (Multi-Joint dynamics with Contact) is identified as a key technology in this domain, serving as a high-fidelity training environment for robot learning [4][8]. Group 3: MuJoCo's Role - MuJoCo allows researchers to create realistic virtual robots and environments, enabling millions of trials and learning experiences without the risk of damaging expensive hardware [6][4]. - The simulation speed can be hundreds of times faster than real-time, significantly accelerating the learning process [6]. - MuJoCo has become a standard tool in both academia and industry, with major companies utilizing it for robot research [8]. Group 4: Practical Training - A comprehensive MuJoCo development course has been designed, focusing on practical applications and theoretical foundations, covering topics from physical simulation to deep reinforcement learning [9][10]. - The course is structured into six modules, each with specific learning objectives and practical projects, ensuring a solid grasp of the technology stack [13][16]. Group 5: Project-Based Learning - The course includes six progressively challenging projects, such as building a robotic arm control system and implementing vision-guided grasping [19][21]. - Each project is designed to reinforce theoretical concepts through hands-on experience, ensuring participants understand both the "how" and "why" of the technology [29][33]. Group 6: Target Audience and Outcomes - The course is suitable for individuals with programming or algorithm backgrounds looking to enter the field of embodied robotics, as well as students and professionals interested in enhancing their practical skills [30][32]. - Upon completion, participants will have a complete technology stack in embodied intelligence, gaining advantages in technical, engineering, and innovation capabilities [32][33].
MinMax-M1:超越DeepSeek,支持百万级token上下文
自动驾驶之心· 2025-06-21 13:15
以下文章来源于AIGC面面观 ,作者欠阿贝尔两块钱 AIGC面面观 . 整理LLM、AIGC的入门笔记 | 论文学习笔记 | 一线大厂面经 | 探索AIGC落地 作者 | 欠阿贝尔两块钱 来源 | AIGC面面观 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 >>点击进入→ 自动驾驶之心 『大模型』技术交流群 主要贡献 1. 高效混合架构设计 :结合MoE架构与Lightning Attention)的模型MiniMax-M1, 支持百万级上下文窗 口(1M tokens) ,生成长度达80K tokens时FLOPs仅为传统注意力模型的25%。 2. 超越DAPO的算法CISPO :通过 剪裁重要性采样权重 提升RL效率,相比DAPO实现2倍加速,避免了 传统方法(如PPO/GRPO)对低概率token有更好的采样效果。 3. 可扩展上下文 :支持从40K到80K Token生成长度的扩展。 本文只做学术分享,如有侵权,联系删文 1.混合注意力架构 Lighting Attention : 采用I/O感知的线性注意力计算,通过分块计算和内存优化 ,将长 ...
AI, Human, a Box and a Cat | Nick Broumas | TEDxUniversityofMacedonia
TEDx Talks· 2025-06-16 15:44
AI Marketing Evolution - AI is integrated into marketing to help partners grow faster and achieve goals [1] - The industry is moving towards hyper-personalization, using AI to understand consumer habits and tailor experiences [7][8][9] - AI-driven campaigns are evolving from fragmented applications to unified, end-to-end management [11][12][13] - Dynamic websites will use AI to recognize user behavior and actively close sales [17] - Smart conversation AIs will become comprehensive sales assistants, offering personalized product presentations and follow-ups [18] Ethical Considerations - The industry faces consumer distrust regarding personal data, emphasizing the need for ethical targeting and platform transparency [20][21] - Platforms should explain why a user is targeted for a specific message, avoiding biased outcomes [21][22] - Internal bias detectors and a comprehensive regulatory framework are needed to prevent discriminatory practices [23] Future Challenges and Solutions - Current AI systems lack general logic and common sense, hindering their ability to understand complex business dynamics [25][26] - Achieving general intelligence requires vast amounts of data and energy, potentially necessitating new energy sources [28][29] - AI is not an original creator and relies on original content for data, driving research into AI models mimicking the human brain [30][31] - Neural augmentation or brain-computer interfaces may be necessary to incorporate human values and address AI's limitations in understanding nuance [33][34][35]
NVIDIA (NVDA) Conference Transcript
2025-06-11 12:45
Summary of NVIDIA Conference Call - June 11, 2025 Company Overview - **Company**: NVIDIA (NVDA) - **Event**: Conference Call - **Date**: June 11, 2025 Key Industry Insights - **AI Growth**: AI is recognized as the fastest-growing technology in history, with global reach and significant potential for expansion in Europe, particularly in France and the EU [1][18] - **Quantum Computing**: The industry is shifting towards a hybrid model of quantum and classical computing, emphasizing the importance of GPU supercomputers for error correction and data generation [9][12][15] - **Sovereign AI**: The development of AI infrastructure in Europe is seen as crucial, with an estimated $1.5 trillion build-out expected over the coming years [17][18] Core Company Strategies - **Local Infrastructure**: NVIDIA is focusing on building AI factories and supercomputing centers for local consumption in Europe, which will support the region's heavy industry and robotics capabilities [16][17] - **Physical AI Models**: The company is developing multimodal physical AI models that can reason and execute tasks based on prompts, differentiating them from traditional LLMs [19][20] - **Gigawatt Projects**: NVIDIA is involved in multiple gigawatt projects across Europe, with a focus on regional cloud service providers and AI factories supported by government initiatives [24][26] Financial and Operational Insights - **Supply Chain Management**: NVIDIA's supply chain is robust, with the ability to forecast demand and place large orders with suppliers like TSMC and Micron, ensuring timely production of supercomputers [30][32] - **Market Demand**: The company is not limited by critical components but must forecast production accurately to meet the growing demand for AI technologies [30][33] - **Post-Training Opportunities**: NVIDIA sees significant potential in post-training processes, which involve reinforcement learning and human feedback to improve AI models [49][52] Challenges and Risks - **Geopolitical Concerns**: The company acknowledges the importance of local infrastructure due to data privacy and geopolitical issues, particularly in Europe [27][28] - **Dependency on Taiwan**: NVIDIA is actively working to reduce its dependency on Taiwan for chip manufacturing, with plans to build substantial AI supercomputer infrastructure in the United States [64][66] - **China Market**: The company has removed China from its forecasts due to export controls, resulting in a significant revenue drop, but remains optimistic about growth in other markets [70][71] Future Outlook - **AI Integration in Enterprises**: NVIDIA is focused on integrating AI into traditional enterprise IT systems, which presents a substantial market opportunity estimated in the hundreds of billions [96][98] - **Continuous Improvement**: The company emphasizes ongoing software improvements that enhance the performance of its hardware, ensuring long-term value for customers [114][115] - **Ecosystem Development**: NVIDIA is building an ecosystem around its NVLink technology, which is expected to facilitate partnerships and enhance its competitive position in the market [91][92] Conclusion NVIDIA is strategically positioned to capitalize on the rapid growth of AI and quantum computing, with a strong focus on local infrastructure development in Europe, robust supply chain management, and continuous innovation in AI technologies. The company faces challenges related to geopolitical risks and market dependencies but remains optimistic about its growth trajectory and market opportunities.
新“SOTA”推理模型避战Qwen和R1?欧版OpenAI被喷麻了
量子位· 2025-06-11 05:13
Core Viewpoint - Mistral AI has launched its first inference model, Magistral, which claims to compete with other leading models but faces skepticism due to lack of direct comparisons with the latest versions of competitors like Qwen and DeepSeek R1 0528 [1][22]. Model Performance - Magistral shows a 50% accuracy improvement on the AIME-24 benchmark compared to its earlier model, Mistral Medium 3 [3]. - In the AIME-24 benchmark, the accuracy for English is 73.6%, while other languages like French and Spanish show lower accuracy rates of 68.5% and 69.3% respectively [8]. Model Versions - Two versions of Magistral have been released: - Magistral Small, which has 24 billion parameters and is open-source under the Apache 2.0 license [4]. - Magistral Medium, a more powerful version aimed at enterprises, available on Amazon SageMaker [5]. Multilingual Support - Magistral is designed for transparent reasoning and supports multilingual inference, addressing the issue where mainstream models perform poorly in European languages compared to local languages [7]. Enhanced Features - Unlike general models, Magistral has been fine-tuned for multi-step logic, improving interpretability and providing a traceable thought process in user language [10]. - The token throughput of Magistral Medium is reported to be 10 times faster than most competitors, enabling large-scale real-time inference and user feedback [14][15]. Training Methodology - Magistral is the first large model trained purely through reinforcement learning (RL) using an improved Group Relative Policy Optimization (GRPO) algorithm [16]. - The model achieves a significant accuracy leap from 26.8% to 73.6% on the AIME-24 benchmark by eliminating KL divergence penalties and dynamically adjusting exploration thresholds [18]. Training Architecture - The model employs an asynchronous distributed training architecture, allowing for efficient large-scale RL training without relying on pre-trained distilled data [20]. - The performance of the 24 billion parameter Magistral Small model reached an accuracy of 70.7% on the AIME-24 benchmark [21]. Competitive Landscape - Comparisons made by users indicate that Qwen 4B is similar in performance to Magistral, while a smaller 30B MoE model outperforms it, and the latest R1 model shows even better results [24].
AI仅凭“自信”学会推理,浙大校友复刻DeepSeek长思维链涌现,强化学习无需外部奖励信号
量子位· 2025-05-29 04:42
梦晨 鹭羽 发自 凹非寺 量子位 | 公众号 QbitAI 复刻DeepSeek-R1的长思维链推理,大模型强化学习新范式RLIF成热门话题。 UC Berkeley团队共同一作 Xuandong Zhao 把这项成果称为: 大模型无需接触真实答案,仅通过优化自己的信心,就能学会复杂推理。 具体来说,新方法完全不需要外部奖励信号或标注数据,只需使用模型自身的置信程度作为内在奖励信号。 与使用外部奖励信号GRPO相比,新方法在数学任务上不需要标准答案也能提升基础模型性能,在代码任务上表现得更好。 几乎同一时间,另外一篇论文《RENT: Reinforcement Learning via Entropy Minimization》也验证了相似的结论。 作者表示两者的主要区别在于使用 KL散度 和 最小化熵 衡量自信程度。 Dropbox工程副总裁看后表示: Confidence is all you need 。 "自信"驱动的强化学习 长期以来,训练大模型主要依赖两种方式: 要么需要大量人工标注(如ChatGPT的RLHF),要么需要可验证的标准答案(如DeepSeek的RLVR)。 前者成本高昂且可能引入偏 ...
312条轨迹激发241%性能!上交大与SII开源电脑智能体,超越 Claude 3.7
机器之心· 2025-05-25 03:51
Core Insights - The article discusses the advancements in computer agents, particularly highlighting the performance improvements achieved by using a minimal amount of human-annotated data, specifically 312 human operation trajectories, to train the PC Agent-E model, which surpassed previous models in performance [1][3][10]. Group 1: Model Development - The research indicates that current large models possess the foundational capabilities to complete tasks using computers, with performance bottlenecks primarily related to long-horizon planning, which can be significantly enhanced with a small number of high-quality trajectories [3][13]. - The team utilized a tool called PC Tracker to collect 312 human operation trajectories, which included task descriptions, screenshots, and keyboard/mouse operations, ensuring data accuracy [4][10]. - The PC Agent-E model was trained on the open-source model Qwen2.5-VL-72B, achieving a performance increase of 241% compared to its initial state, demonstrating high sample efficiency [10][11]. Group 2: Methodology Innovations - A key innovation in the research is the "Thought Completion" process, which adds reasoning behind each action taken by humans, thereby enhancing the quality of the training data [7][8]. - The "Trajectory Boost" method was introduced to synthesize additional action decisions for each step in the trajectory, capturing the inherent diversity of possible actions for computer tasks, which significantly enriched the training data [8][11]. - The results showed that as the number of synthesized actions increased, model performance improved significantly, validating the effectiveness of the trajectory enhancement method [11][12]. Group 3: Performance Evaluation - PC Agent-E was evaluated on the WindowsAgentArena-V2, outperforming the Claude 3.7 Sonnet's extended thinking mode, marking it as the new state-of-the-art (SOTA) for open-source computer agents on Windows systems [10][11]. - The research concluded that a small number of high-quality trajectories can effectively stimulate a powerful long-horizon planning capability in agents, reducing the need for vast amounts of human-annotated data [13].
港大马毅谈智能史:DNA 是最早的大模型,智能的本质是减熵
晚点LatePost· 2025-05-23 07:41
Core Viewpoint - The essence of intelligence is "learning," which is a process of finding and utilizing patterns in the external world to make predictions and counteract the increase of entropy in the universe [3][15][21]. Group 1: Understanding Intelligence - Intelligence should not be understood superficially; it requires a historical perspective on its development from biological origins to machine intelligence [2][3]. - The historical evolution of intelligence includes four stages: genetic evolution through natural selection, the emergence of neural systems and memory, the development of language and writing for knowledge transmission, and the abstraction and generalization seen in mathematics and science [20][21]. Group 2: Machine Intelligence and Learning Mechanisms - Current AI models, such as o1 and R1, primarily rely on memorization rather than true reasoning, lacking the ability to independently generate abstract concepts [7][22]. - The training of models like DeepSeek demonstrates that open-source approaches can surpass closed-source methods, as the core of AI development lies in data and algorithms rather than proprietary technology [14][12]. Group 3: Educational Initiatives - The introduction of AI literacy courses at universities aims to equip students with an understanding of AI's history, current technologies, and their societal implications, fostering independent critical thinking [37][38]. - The curriculum emphasizes the importance of understanding the basic concepts of AI and its ethical considerations, preparing students for future interactions with intelligent systems [42][39]. Group 4: Future Directions in AI Research - The pursuit of closed-loop feedback mechanisms in AI systems is seen as essential for achieving true intelligence, as it allows for self-correction and adaptation in open environments [43][46]. - The current state of AI is compared to early biological evolution, where significant advancements are still needed to move beyond basic capabilities [30][31].
Pony Ai(PONY) - 2025 Q1 - Earnings Call Transcript
2025-05-20 13:00
Financial Data and Key Metrics Changes - Revenue totaled $14 million, up 11.6% year over year, driven by rapid growth in Robotaxi services [28] - Robotaxi service revenue was $1.7 million, growing 200.3% year over year, with fare charging revenues increasing approximately 800% year over year [28][29] - Gross profit reached $2.3 million, resulting in a gross margin of 16.6%, down from 21% in the same period last year [30] - Net loss was $37.9 million compared to $20.8 million in the first quarter of 2024 [31] Business Line Data and Key Metrics Changes - Robotruck services revenue grew by 4.2% year over year to $7.8 million, primarily driven by contributions from new clients [29] - Licensing and application revenues remained flat year over year at $4.5 million [29] Market Data and Key Metrics Changes - The number of registered users on the Pony Pilot app increased by more than 20% quarter over quarter [13] - The company secured China's first fully driverless commercial robotaxi license in Shenzhen, unlocking operations in key economic and transportation hubs [12] Company Strategy and Development Direction - The company aims to scale up its Gen 7 robotaxi fleet to 1,000 vehicles by year-end 2025, focusing on mass production and deployment [6][17] - Strategic partnerships with Tencent and Uber are expected to enhance the ecosystem and expand market reach [7][15] - The company is committed to disciplined investment in mass production while maintaining financial resilience [27] Management's Comments on Operating Environment and Future Outlook - Management emphasized the importance of technological advancements and operational efficiency in achieving long-term profitability [17][32] - The company is optimistic about its growth trajectory, expecting to reduce financial volatility and build a more predictable path to growth [32] Other Important Information - The Gen 7 autonomous driving system was launched at the Shanghai Auto Show, achieving a 70% reduction in bill of materials costs compared to the previous generation [6][8] - The company has made significant improvements in operational cost optimization, including a remote assistant to driver ratio of 20:1 [11][24] Q&A Session Summary Question: Progress and pipeline for 2026 - Management confirmed a clear pipeline for Gen 7 robotaxi mass production, expecting to ramp up deployment in the second half of 2025 and accelerate growth in 2026 [37][39] Question: Evolving global strategy and focus on China - The company prioritizes the China market but is expanding into international markets with strong mobility demand and favorable regulations [41][44] Question: Factors driving revenue growth in Robotaxi segment - Revenue growth was driven by fare charging and project-based engineering services, with expectations of natural volatility in revenues [46][49] Question: Impact of regulatory requirements on operations - Management believes recent regulatory requirements will benefit the company by clarifying distinctions between L2 and L4 autonomous driving [58][60] Question: Potential impact of U.S.-China tariff issues - The company expects minimal impact from tariff issues due to a predominantly domestic supply chain and enhanced supply chain resilience [66]
客户不转化、内容不合规?AI与Agent如何破解金融营销五大难题
3 6 Ke· 2025-05-12 08:15
Core Insights - The article emphasizes that AI and Agents are no longer optional tools but are essential drivers for transforming customer insights, decision-making efficiency, and service experience in financial marketing [1][2][3] Evolution of Financial Marketing - Financial marketing has evolved from a traditional model reliant on physical branches and customer managers (1.0) to a digital model utilizing CRM and online channels (2.0), but issues like data silos and fragmented experiences persist [2] - The industry is now entering the intelligent 3.0 era, where AI technologies, particularly large language models and Agents, are becoming the core engines driving marketing transformation [2][3] AI's Value Proposition - AI provides unprecedented customer insights by analyzing both structured and unstructured data, enabling the identification of deep, often unrecognized customer needs [2] - AI facilitates real-time and precise decision-making by integrating various data points to generate optimal marketing strategies tailored to individual customers [3] - AI-driven Agents enhance service execution by automating repetitive tasks, improving efficiency, and allowing human staff to focus on more complex, value-added services [4] Current Challenges in Financial Marketing - High customer acquisition costs and low conversion rates are significant challenges, with customer acquisition costs (CAC) often exceeding thousands of dollars [5][6] - Personalization remains a challenge, as many financial institutions struggle to provide truly individualized experiences due to data fragmentation [7] - Complex products lead to customer confusion, making it difficult for them to make informed purchasing decisions [8] - Regulatory compliance poses challenges to innovation, requiring a balance between compliance and efficiency [8] - Measuring marketing effectiveness is complicated, with traditional attribution models failing to provide clear insights into ROI [9] AI and Agent Solutions - A robust "intelligent marketing platform" is proposed as a solution, consisting of a data foundation that integrates internal and external data to create a comprehensive customer view [10] - The platform includes an "intelligent engine" for AI algorithms that support customer understanding, predictive analytics, and decision-making [11] - Successful case studies demonstrate the tangible benefits of AI and Agents in enhancing customer insights, improving conversion rates, and increasing marketing efficiency [12] Future Outlook - The future of financial marketing will focus on "intelligent density," where the effective use of smart technologies will create competitive advantages in understanding customers and optimizing experiences [14]