Workflow
自动驾驶强化学习
icon
Search documents
理想分享自动驾驶强化学习闭环训练框架
理想TOP2· 2025-11-27 16:10
Core Viewpoint - The article discusses the advancements in autonomous driving through the introduction of the AD-R1 framework, which utilizes closed-loop reinforcement learning to enhance safety and robustness in end-to-end autonomous driving systems, addressing the limitations of existing world models in predicting dangerous outcomes [2][4]. Group 1: Closed-Loop vs. Open-Loop Systems - Open-loop systems rely on offline data and static playback, while closed-loop systems interact dynamically with the environment, allowing for real-time adjustments to the vehicle's trajectory [1]. - The AD-R1 framework represents a significant step in closed-loop reinforcement learning for autonomous driving [1]. Group 2: Challenges in Imitation Learning - Imitation learning faces two main challenges: distribution shift due to unseen long-tail scenarios in the real world and the lack of negative feedback, making it difficult for AI to learn from mistakes [3]. - Optimistic bias is identified as a systemic flaw in reinforcement learning for autonomous driving, where models may generate unrealistic safe scenarios despite unsafe actions [3]. Group 3: AD-R1 Framework Components - The AD-R1 framework includes two core components: the development of an impartial world model and reinforcement learning based on future imaginings [4]. - The impartial world model employs counterfactual data synthesis to teach the model the consequences of unsafe driving behaviors [4]. Group 4: Model Training and Evaluation - The training process involves sampling candidate trajectories, imagining future scenarios using the impartial world model, scoring based on predicted outcomes, and updating the policy using the GRPO algorithm [8]. - The framework allows for detailed reward calculations through the use of 3D/4D voxel outputs, enhancing the evaluation of collision severity and ensuring vehicle stability on the road [8]. Group 5: Additional Features - Trajectory-aware gating is implemented to ensure the model focuses on relevant features along the driving path, while ego-trajectory fidelity loss penalizes deviations from the input control commands [6]. - The framework also includes volume collision penalties and vertical clearance checks to enhance safety in complex environments [8].