Workflow
world model
icon
Search documents
强化学习应用在自动驾驶中的一些思考
自动驾驶之心· 2025-12-23 00:53
Core Viewpoint - The article discusses the application of reinforcement learning (RL) fine-tuning in trajectory planning for autonomous driving, emphasizing the transition from open-loop to closed-loop training methods to enhance the effectiveness of training models [3][4]. Group 1: Training Methodology - The mainstream planning modules based on learning typically use imitation learning, which can struggle with out-of-distribution scenarios during real-world testing [3]. - A closed-loop training approach is proposed, which simulates real vehicle testing environments, making it more effective than open-loop training [4]. - The article introduces a network structure based on Waymo's previous work, MotionLM, which outputs trajectories in an autoregressive manner, ensuring causal relationships are maintained [4][6]. Group 2: Input and Output Structure - The network's input is designed to be scene-centered, summarizing static information over a specified time frame rather than relying on the current frame alone, which helps prevent the vehicle from navigating outside the perceived road [6]. - Many imitation learning methods combine single-frame perception with ground truth (GT) data over several seconds, which can lead to causal inconsistencies if the perception range is limited [7]. Group 3: Reward Function and Training Phases - The training process consists of two phases: pretraining and reinforcement learning, with a simple reward function that balances efficiency and safety by considering both GT fitting and collision avoidance [11]. - The reward function is calculated by normalizing the rewards across all samples and time steps, allowing for the omission of a critic network, similar to the GRPO method [13]. Group 4: Challenges and Future Directions - The article notes that many imitation learning methods introduce auxiliary losses that can lead to undesirable model outputs, highlighting the limitations of open-loop training [14]. - The core value of reinforcement learning lies in closed-loop learning, which can significantly enhance model capabilities even with smaller datasets [14].
X @TechCrunch
TechCrunch· 2025-12-19 19:29
Startup & Valuation - Yann LeCun 确认了他的新“世界模型”创业公司 [1] - 据报道,该公司寻求超过 50 亿美元的估值 [1]
观察者网WAIC直播实录:AI大潮下的具身和人形,中国在跟跑还是并跑?
Guan Cha Zhe Wang· 2025-08-03 05:36
Group 1 - The global focus is on "embodied intelligence" and "humanoid robots," with discussions on whether China is catching up to or surpassing the U.S. in AI advancements [1][3] - The dialogue at WAIC highlighted the importance of supply chains, reinforcement learning algorithms, and capital pathways in the development of humanoid robots [1][3] - Companies like Midea have diversified into humanoid robotics, leveraging their existing technology and product lines to enter this new market [4][5] Group 2 - Midea's acquisition of KUKA in 2016 marked its entry into the robotics sector, with a focus on various industries including automotive and logistics [5] - The development of humanoid robots has seen significant advancements due to breakthroughs in reinforcement learning and embodied intelligence, allowing for more complex robotic movements [9][10] - The current humanoid robots average around 40 joints, with traditional methods of control being replaced by reinforcement learning techniques [9][11] Group 3 - The discussion emphasized the differences between traditional hydraulic-driven robots and modern electric-driven robots, highlighting the advantages of the latter in incorporating intelligent algorithms [12][13] - The potential for humanoid robots to evolve into "super humanoid robots" tailored for specific industrial applications was explored, aiming to exceed human efficiency in tasks [15][16] - The conversation also touched on the necessity of dexterous hands for humanoid robots, with a focus on the trade-offs between complexity and reliability in real-world applications [24][27] Group 4 - The concept of embodied intelligence was defined as the ability of robots to interact effectively with the physical world, moving beyond traditional control methods [31][36] - The importance of world models and video models in enhancing the capabilities of humanoid robots was discussed, emphasizing their role in understanding complex environments [37][42] - Reinforcement learning was identified as a crucial component in the development of intelligent robots, with companies like Dyna Robotics focusing on real-world applications [46][47]