强化学习
Search documents
GPT-5被吐槽没进步?Epoch年终报告打脸:AI在飞速狂飙,ASI更近了
3 6 Ke· 2025-12-24 11:17
Core Insights - The core message of the article is that AI development has accelerated rather than stagnated, with significant advancements in capabilities observed in recent months [7][10]. Group 1: AI Model Performance - Epoch AI tested several open-source Chinese models on FrontierMath, revealing that they lagged behind top global AI models by approximately seven months [1]. - The only model to score was DeepSeek-V3.2, achieving a score of about 2% [4]. - While top models like GPT and Gemini performed well on traditional math tests, their accuracy on FrontierMath was still low, indicating that all AI models struggle with complex mathematical problems [5][6]. Group 2: AI Capability Growth - The Epoch Capabilities Index (ECI) indicates that AI capability growth has accelerated since April 2024, nearly doubling the previous growth rate [10]. - Contrary to perceptions that AI progress has slowed since the release of GPT-4, data shows that advancements continue, particularly in reasoning abilities rather than just increasing model size [12]. Group 3: Cost and Accessibility of AI - The cost of AI reasoning has dramatically decreased, with token prices dropping over tenfold from April 2023 to March 2025, making AI more accessible to a broader audience [19]. - High-performance AI models can now run on consumer-grade hardware, suggesting that advanced AI capabilities will soon be widely available [22]. Group 4: Research and Development Trends - A significant portion of OpenAI's computational resources in 2024 is allocated to experiments rather than direct training or inference, highlighting the experimental nature of current AI development [25][28]. - NVIDIA's AI computing power has been doubling approximately every ten months since 2020, indicating rapid growth in the hardware necessary for AI advancements [29]. Group 5: Insights on AI's Future Impact - Epoch AI suggests that the majority of AI's value may come from automating routine tasks across the economy rather than solely from accelerating research and development [49]. - The potential for AI to transform industries may occur gradually over years or decades, rather than through sudden breakthroughs [52].
聚首香江!机器人产业大佬,重磅发声!
Zhong Guo Ji Jin Bao· 2025-12-24 10:41
【导读】聚首香江!机器人与AI产业大咖共探未来发展新方向 2025年12月20日下午,在首届香港国际AI艺术节期间,"破界融新,投启未来"2025机器人产业和AI投资 论坛成功举办。 论坛上,优必选首席品牌官谭旻、AMD大中华区销售副总裁晁亚新、迅兔科技创始人兼CEO李罗丹、 上海灵境智源CEO孙博、天数智芯边端产品事业部副总经理宋远盈,以及开源证券机械首席分析师孟鹏 飞围绕"生态共建在传统与创新中融合"主题,共同探讨AI的颠覆性影响与机器人技术发展前景。 01 人形是通用人工智能的必要载体吗? 孟鹏飞:中国已成为全球工业机器人和服务机器人第一大国。2022年,马斯克Optimus第一代的推出, 让我们看到人形机器人这一划时代产品的潜力。人形机器人不仅是AI闭环的关键,更能推动人类文明 和技术向前发展。2026年是人形机器人即将量产的重要节点。 请问各位,当前人形机器人处于技术和产业发展的哪个阶段?机器人是否必须通过人形来实现通用人工 智能? 谭旻:人形并非唯一形态,但却是具身智能的最佳形态。通用人工智能需要通用的数据基座,非人形机 器人难以实现全面通用。而人形机器人能收集真实世界数据,搭建世界模型,为Phy ...
业内首个RL+VLA汇总:强化学习如何推动 VLA 走向真实世界?
自动驾驶之心· 2025-12-24 09:22
Core Insights - The article discusses advancements in Vision-Language-Action (VLA) models for autonomous driving, highlighting a shift from traditional supervised learning methods to reinforcement learning (RL) approaches to enhance model generalization and reasoning capabilities [2]. Summary by Sections VLA + RL Research Overview - The article summarizes recent works in the VLA + RL domain, indicating a trend towards using RL to address limitations in previous models, particularly in terms of hallucination issues and the efficiency of continuous action space exploration [2]. Key Papers and Contributions - **MindDrive**: Introduces a framework that transforms action space into a discrete language decision space, achieving a driving score of 78.04 and a success rate of 55.09% on the Bench2Drive benchmark using a lightweight model [6]. - **WAM-Diff**: Proposes an end-to-end VLA framework that utilizes masked diffusion for trajectory optimization, achieving superior performance on the NAVSIM benchmark [7]. - **LCDrive**: Addresses temporal expression and latency issues in text chain reasoning by employing a latent chain-of-thought mechanism, demonstrating improved reasoning efficiency and trajectory quality [12]. - **Reasoning-VLA**: Develops a framework that enhances parallel trajectory generation through learnable action queries, achieving high performance across multiple datasets [13]. - **Alpamayo-R1**: Bridges reasoning and action prediction through a modular architecture and multi-stage training, improving generalization in long-tail scenarios [18]. - **AdaThinkDrive**: Introduces a dual-mode mechanism to balance decision accuracy and reasoning efficiency, achieving a PDMS score of 90.3 on the Navsim benchmark [20]. - **AutoDrive-R²**: Combines supervised fine-tuning and RL to enhance trajectory planning accuracy, achieving state-of-the-art performance with a significant reduction in error rates [25]. - **IRL-VLA**: Proposes a framework that avoids reliance on simulators by using a reward world model, achieving state-of-the-art performance on the NAVSIM v2 benchmark [31]. - **DriveAgent-R1**: Integrates active perception with hybrid thinking, achieving significant improvements in decision reliability and efficiency [32]. - **Drive-R1**: Connects reasoning and planning in VLMs, providing effective methods for integrating reasoning with motion planning [37]. - **ReCogDrive**: Merges cognitive reasoning with diffusion planners, achieving state-of-the-art performance while addressing the limitations of imitation learning [38].
聊聊导航信息SD如何在自动驾驶中落地?
自动驾驶之心· 2025-12-23 00:53
Core Viewpoint - The article discusses the application of navigation information in autonomous driving, emphasizing its importance in providing lane guidance, waypoint information, and reference lines to enhance vehicle path planning and control [2][4][31]. Group 1: Navigation Information Application - Navigation information SD/SD Pro is already utilized in many production solutions, offering a rough global and local view for drivers [2]. - The core responsibilities of the navigation module include providing reference lines, which significantly reduce planning pressure by offering a predefined driving path [4]. - Additional functionalities include providing planning constraints and priorities, as well as path monitoring and replanning [5]. Group 2: Path Planning and Behavior Guidance - Global path planning at the lane level involves searching for the optimal lane sequence to reach the target lane [6]. - Behavior planning is enhanced by providing clear semantic guidance, allowing vehicles to prepare for lane changes, deceleration, and yielding in advance [6]. Group 3: Course Overview - The course titled "End-to-End Practical Class for Mass Production" focuses on practical applications in autonomous driving, covering topics from one-stage and two-stage frameworks to trajectory optimization and production experience sharing [23]. - The curriculum includes chapters on end-to-end task overview, two-stage and one-stage algorithms, navigation information applications, reinforcement learning in autonomous driving, trajectory output optimization, fallback solutions, and mass production experience [28][30][31][32][33][34][35]. Group 4: Target Audience and Course Details - The course is aimed at advanced learners with a background in autonomous driving algorithms, reinforcement learning, and programming [36][38]. - The course will commence on November 30, with a duration of three months, featuring offline video teaching and online Q&A sessions [36][39].
强化学习应用在自动驾驶中的一些思考
自动驾驶之心· 2025-12-23 00:53
Core Viewpoint - The article discusses the application of reinforcement learning (RL) fine-tuning in trajectory planning for autonomous driving, emphasizing the transition from open-loop to closed-loop training methods to enhance the effectiveness of training models [3][4]. Group 1: Training Methodology - The mainstream planning modules based on learning typically use imitation learning, which can struggle with out-of-distribution scenarios during real-world testing [3]. - A closed-loop training approach is proposed, which simulates real vehicle testing environments, making it more effective than open-loop training [4]. - The article introduces a network structure based on Waymo's previous work, MotionLM, which outputs trajectories in an autoregressive manner, ensuring causal relationships are maintained [4][6]. Group 2: Input and Output Structure - The network's input is designed to be scene-centered, summarizing static information over a specified time frame rather than relying on the current frame alone, which helps prevent the vehicle from navigating outside the perceived road [6]. - Many imitation learning methods combine single-frame perception with ground truth (GT) data over several seconds, which can lead to causal inconsistencies if the perception range is limited [7]. Group 3: Reward Function and Training Phases - The training process consists of two phases: pretraining and reinforcement learning, with a simple reward function that balances efficiency and safety by considering both GT fitting and collision avoidance [11]. - The reward function is calculated by normalizing the rewards across all samples and time steps, allowing for the omission of a critic network, similar to the GRPO method [13]. Group 4: Challenges and Future Directions - The article notes that many imitation learning methods introduce auxiliary losses that can lead to undesirable model outputs, highlighting the limitations of open-loop training [14]. - The core value of reinforcement learning lies in closed-loop learning, which can significantly enhance model capabilities even with smaller datasets [14].
专访地平线副总裁吕鹏:做不好端到端就做不好VLA
2 1 Shi Ji Jing Ji Bao Dao· 2025-12-23 00:45
Core Insights - The domestic market for passenger cars priced above 200,000 yuan accounts for 30% of the market share, while those below 130,000 yuan hold a significant 50% share, indicating a vast opportunity for companies like Horizon and Momenta to capture market share in the autonomous driving sector [1][13] - Horizon has launched its Horizon SuperDrive (HSD) solution based on the Journey 6 series chip, entering mass production with significant activation numbers shortly after the launch of new models [1][14] - The company aims to make urban assisted driving technology accessible to vehicles priced at 100,000 yuan, targeting a production scale of 10 million units within the next 3-5 years [2][14] Market Dynamics - The market for vehicles priced below 130,000 yuan is largely untapped in terms of urban assisted driving features, attracting various autonomous driving companies to accelerate their market strategies [1][13] - Horizon's HSD solution has seen rapid adoption, with over 12,000 activations within two weeks of launching two new models, indicating strong market demand [1][14] Technological Development - Horizon is focusing 90% of its R&D resources on end-to-end technology, which is seen as crucial for the future of autonomous driving [2][14] - The company believes that a solid end-to-end foundation is essential for integrating new modalities and enhancing product performance [15][21] Competitive Landscape - Companies lacking chip development capabilities are increasingly collaborating with Horizon, highlighting the company's strong position in the market [2][14] - Horizon's commitment to an end-to-end approach distinguishes it from competitors who are exploring various models, such as VLA [2][21] Technical Insights - The end-to-end system developed by Horizon is one of the few complete systems available, with a focus on seamless information transfer and high-dimensional feature integration [16][17] - The distinction between one-stage and two-stage end-to-end systems is critical, with the former providing a more cohesive and intuitive driving experience [18][19] Future Directions - Horizon plans to enhance its product experience and safety, emphasizing the importance of market acceptance over new terminologies and concepts [11][22] - The company is open to integrating VLA technology in the future but maintains that a robust end-to-end system is foundational for success [24]
机器人学习现状!PI团队内部员工分享(从数采到VLA再到RL)
具身智能之心· 2025-12-23 00:03
Core Insights - The article discusses the current state of robot learning as of December 2025, emphasizing that most systems rely on behavior cloning (BC) and the challenges associated with it [5][40][39] - It highlights the importance of human demonstrations in training robot learning systems and the need for innovative approaches to improve performance and robustness [72][73] Group 1: Behavior Cloning and Its Challenges - As of December 2025, all robot learning systems primarily utilize behavior cloning, where human demonstrations are used to train models to mimic actions [5][6] - The challenges of behavior cloning include the inability to generalize beyond the training data, leading to performance issues in real-world applications [16][21][23] - The article outlines the difficulties in collecting high-quality demonstration data and the need for diverse and representative datasets to improve model training [7][12][19] Group 2: Future Directions and Innovations - The article predicts that within two years, video models will replace current visual-language architectures in robot learning [72] - It suggests that world models will effectively simulate general open-world interactions within ten years, enhancing the capabilities of robot learning systems [72] - The need for a robust human demonstration system that can effectively address the challenges of data collection and model training is emphasized as a key area for future development [73][76]
智能驾驶行业专题:Robo-X的产业趋势、市场空间和产业链拆解
2025-12-22 15:47
Summary of Robo-X Industry Trends and Market Analysis Industry Overview - The L4 autonomous driving market has significant potential, with a projected global market size reaching trillions by 2030. The domestic market for Robot Taxi and Robot Van is estimated at 236 billion yuan and 164.5 billion yuan, respectively. Other segments like unmanned trucks, buses, and sanitation vehicles also show promise [1][2] Core Insights and Arguments - Government policies worldwide are easing restrictions on autonomous driving and establishing regulatory frameworks, which is accelerating the development of smart driving technologies. Cities like Beijing, Shanghai, Guangzhou, and Shenzhen have initiated ROS services, with Wuhan and Chongqing also opening related services [1][6] - Reinforcement learning and world models are foundational technologies for L4 autonomous driving, addressing issues of data scarcity and module dependency in traditional imitation learning, thereby enhancing the system's generalization and decision-making capabilities [1][8] - The operational cost advantage of Robotaxi is notable, with costs as low as 0.81 yuan per kilometer without a safety driver, which is lower than traditional fuel and electric ride-hailing services. Profitability is expected when the fleet size reaches 1,000 vehicles [1][14] Market Segmentation and Key Players - In the RoboTaxi sector, key players include WeRide, Pony.ai, and Loong Air. The RoboVan segment features companies like 90 Smart, New Stone Age, and others, focusing on last-mile delivery efficiency [3][4] - The Robotruck market is projected to reach 90 billion yuan by 2030, with significant collaboration between manufacturers, autonomous driving companies, and logistics firms [3][22] - The RoboBus segment is being developed by companies like WeRide and Qizhou Zhihang, with potential market sizes of 15-35 billion yuan based on current bus sales [23] - The Robot Sweeper market, addressing labor shortages, is also expanding, with a potential market size of 11.3-22.5 billion yuan [24] Investment Opportunities - Recommended companies in vehicle sales and operations include Pony.ai, WeRide, and XPeng Motors. In the components sector, companies like Sutong Juchuang and Hesai Technology are highlighted, along with data processing firms such as Coboda and Horizon Robotics [5][25] Policy Support and Technological Advancements - Global regions, including the Middle East and Southeast Asia, are progressively relaxing regulations on autonomous driving, which is crucial for industry growth. The development of L4 technology is supported by advancements in reinforcement learning and world models, leading to reduced component costs [2][10] Economic Viability and Future Projections - The Robotaxi market is expected to grow significantly, with a projected fleet size of 7,000 vehicles by 2025, capturing a 0.6% market share in shared mobility. The potential for Robotaxi to enhance urban traffic efficiency and provide a safer driving experience is substantial [11][12] - The cost structure of Robotaxi shows that while manufacturing costs are about three times that of traditional ride-hailing vehicles, the operational costs are significantly lower, leading to a favorable economic outlook [13] Conclusion - The autonomous driving industry is on the cusp of commercialization, driven by supportive policies, technological advancements, and cost reductions. The market for various segments, including Robotaxi, RoboVan, Robotruck, and others, presents numerous investment opportunities as companies continue to innovate and expand their services [10][20]
迪士尼机器人「摔跤」也内卷:不仅要摔得轻,还要摔得帅!AI新研究把Bug玩成绝活
机器人大讲堂· 2025-12-22 11:26
机器人摔倒是个大难题,尤其是 "头重脚轻"的机器人,一不小心就可能造成昂贵的损伤。过去,为了防止摔 倒,工程师们要么限制其性能,让它畏首畏尾;要么任其"硬着陆" 。 这些方法都治标不治本。 但是,如果换个思路呢? 与其想尽办法避免摔倒,不如把 "摔倒"本身,变成一门可以学习和控制的艺术。 就在最近,来自迪士尼研究院( Disney Research)的一项最新研究,彻底颠覆了我们对机器人摔倒的认 知。他们提出了一种名为"机器人速成班:学习柔软且风格化的摔倒"(Robot Crash Course: Learning Soft and Stylized Falling)的全新方法。 这项研究的核心思想是: 让机器人不仅能摔得 "软",最大限度减少冲击和损伤,还能摔得"帅",在倒地后摆 出一个用户指定的、充满艺术感的姿势。 想象一下,一个机器人在舞台上出现失误,它没有僵硬地倒下,而是顺势一个翻滚,最后以一个帅气的卧倒姿 势结束,不仅没出糗,反而秀了一波操作。这简直是把 Bug玩成了绝活! 这项研究成果,不仅能让机器人在娱乐、影视等行业大放异彩,更能为机器人的安全和快速恢复提供全新的解 决方案。一个能控制自己摔倒姿 ...
RL加持的3D生成时代来了!首个「R1 式」文本到3D推理大模型AR3D-R1登场
机器之心· 2025-12-22 08:17
强化学习(RL)在大语言模型和 2D 图像生成中大获成功后,首次被系统性拓展到文本到 3D 生成领域!面对 3D 物体更高的空间复杂性、全局几何一致 性和局部纹理精细化的双重挑战,研究者们首次系统研究了 RL 在 3D 自回归生成中的应用! 强化学习应用于 3D 生成的挑战 来自上海人工智能实验室、西北工业大学、香港中文大学、北京大学、香港科技大学等机构的研究者提出了 AR3D-R1 ,这是首个强化学习增强的文本到 3D 自回归模型。该工作系统研究了奖励设计、RL 算法和评估基准,并提出 Hi-GRPO ——一种层次化强化学习范式,通过分离全局结构推理与局部纹理 精修来优化 3D 生成。同时引入全新基准 MME-3DR ,用于评估 3D 生成模型的隐式推理能力。 实验表明 AR3D-R1 在 Kernel Distance 和 CLIP Score 上均取得显著提升,达到 0.156 和 29.3 的优异成绩。 论文标题:Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation 代码链接: https://github. ...