闭环强化学习
Search documents
闭环训练终于补上了!AD-R1:世界模型端到端闭环强化学习新框架(澳门大学&理想等)
自动驾驶之心· 2025-11-27 00:04
Core Insights - The article discusses the advancements in autonomous driving through the introduction of the AD-R1 framework, which utilizes an Impartial World Model to address the "optimistic bias" found in traditional world models [2][3][57] - The framework allows for closed-loop reinforcement learning, enabling autonomous vehicles to learn from imagined failures, thereby improving safety and decision-making capabilities [9][57] Group 1: Background and Challenges - End-to-end autonomous driving has transformed the industry, but challenges remain, particularly with long-tail event failures due to distribution shifts [6] - Traditional reinforcement learning methods rely on external simulators, which have limitations such as simulation-to-reality gaps and lack of interactivity [6][9] - The need for a paradigm shift towards learning 3D/4D world models as high-fidelity generative simulators is emphasized [6] Group 2: Optimizing World Models - The AD-R1 framework introduces a new approach to mitigate the optimistic bias in world models, which often fail to predict negative outcomes [2][7] - The Impartial World Model (IWM) is designed to accurately reflect the consequences of both safe and unsafe behaviors, enhancing the reliability of predictions [3][10] - A counterfactual synthesis pipeline is implemented to generate a diverse training dataset that includes reasonable collision and lane deviation scenarios [3][10] Group 3: Experimental Results - The IWM significantly outperforms traditional models in risk prediction tasks, demonstrating its ability to accurately foresee failures [47][48] - The application of the AD-R1 framework leads to notable improvements in safety and performance metrics across various baseline models, with absolute increases in planning decision metrics (PDMS) of 1.7% and 1.1% [49] - Ablation studies reveal that the introduction of counterfactual synthesis and model-level optimizations are critical for enhancing causal fidelity and overall performance [51][52] Group 4: Future Directions - Future research may focus on generating counterfactual failure samples from unlabeled data to reduce reliance on high-precision annotations [57] - Expanding the framework to more complex multi-agent interaction scenarios could further enhance the robustness of autonomous driving systems in long-tail events [57]