DriveDPO
Search documents
模仿学习无法真正端到端!DriveDPO:Safety DPO打破模仿学习固有缺陷(中科院最新)
自动驾驶之心· 2025-10-03 03:32
Core Viewpoint - The article discusses the challenges of end-to-end autonomous driving, particularly focusing on the limitations of imitation learning and the introduction of DriveDPO, a safety-oriented policy learning framework that enhances driving safety and reliability [1][7][28]. Summary by Sections Imitation Learning Challenges - Imitation learning can lead to unsafe driving behaviors despite generating trajectories that appear human-like, as it does not account for the safety implications of certain maneuvers [5][11]. - The symmetric loss functions commonly used in imitation learning fail to differentiate between safe and unsafe deviations from human trajectories, leading to potential risks [5][11]. DriveDPO Framework - DriveDPO integrates human imitation signals and rule-based safety scores into a unified strategy distribution for direct policy optimization, addressing the shortcomings of both imitation learning and score-based methods [8][12]. - The framework employs an iterative Direct Preference Optimization (DPO) approach to prioritize trajectories that are both human-like and safe, enhancing the model's responsiveness to safety preferences [8][19]. Experimental Results - Extensive experiments on the NAVSIM benchmark dataset demonstrated that DriveDPO achieved a PDMS (Policy Decision Metric Score) of 90.0, outperforming previous methods by 1.9 and 2.0 points respectively [8][22]. - Qualitative results indicate significant improvements in safety and compliance in complex driving scenarios, showcasing the potential of DriveDPO for safety-critical applications [12][28]. Contributions - The article identifies key challenges in current imitation learning and score-based methods, proposing DriveDPO as a solution that combines unified strategy distillation with safety-oriented DPO for effective policy optimization [12][28]. - The framework's ability to suppress unsafe behaviors while enhancing overall driving performance highlights its potential for deployment in autonomous driving systems [12][28].