End-to-End Autonomous Driving
Search documents
模仿学习无法真正端到端!DriveDPO:Safety DPO打破模仿学习固有缺陷(中科院最新)
自动驾驶之心· 2025-10-03 03:32
Core Viewpoint - The article discusses the challenges of end-to-end autonomous driving, particularly focusing on the limitations of imitation learning and the introduction of DriveDPO, a safety-oriented policy learning framework that enhances driving safety and reliability [1][7][28]. Summary by Sections Imitation Learning Challenges - Imitation learning can lead to unsafe driving behaviors despite generating trajectories that appear human-like, as it does not account for the safety implications of certain maneuvers [5][11]. - The symmetric loss functions commonly used in imitation learning fail to differentiate between safe and unsafe deviations from human trajectories, leading to potential risks [5][11]. DriveDPO Framework - DriveDPO integrates human imitation signals and rule-based safety scores into a unified strategy distribution for direct policy optimization, addressing the shortcomings of both imitation learning and score-based methods [8][12]. - The framework employs an iterative Direct Preference Optimization (DPO) approach to prioritize trajectories that are both human-like and safe, enhancing the model's responsiveness to safety preferences [8][19]. Experimental Results - Extensive experiments on the NAVSIM benchmark dataset demonstrated that DriveDPO achieved a PDMS (Policy Decision Metric Score) of 90.0, outperforming previous methods by 1.9 and 2.0 points respectively [8][22]. - Qualitative results indicate significant improvements in safety and compliance in complex driving scenarios, showcasing the potential of DriveDPO for safety-critical applications [12][28]. Contributions - The article identifies key challenges in current imitation learning and score-based methods, proposing DriveDPO as a solution that combines unified strategy distillation with safety-oriented DPO for effective policy optimization [12][28]. - The framework's ability to suppress unsafe behaviors while enhancing overall driving performance highlights its potential for deployment in autonomous driving systems [12][28].
上交&卡尔动力FastDrive!结构化标签实现端到端大模型更快更强~
自动驾驶之心· 2025-06-23 11:34
Core Viewpoint - The integration of human-like reasoning capabilities into end-to-end autonomous driving systems is a cutting-edge research area, with a focus on vision-language models (VLMs) [1]. Group 1: Structured Dataset and Model - A structured dataset called NuScenes-S has been introduced, which focuses on key elements closely related to driving decisions, eliminating redundant information and improving reasoning efficiency [4][5]. - The FastDrive model, with 0.9 billion parameters, mimics human reasoning strategies and effectively aligns with end-to-end autonomous driving frameworks [4][5]. Group 2: Dataset Description - The NuScenes-S dataset provides a comprehensive view of driving scenarios, addressing issues often overlooked in existing datasets. It includes key elements such as weather, traffic conditions, driving areas, traffic lights, traffic signs, road conditions, lane markings, and time [7][8]. - The dataset construction involved annotating scene information using both GPT and human input, refining the results through comparison and optimization [9]. Group 3: FastDrive Algorithm Model - The FastDrive model follows the "ViT-Adapter-LLM" architecture, utilizing a Vision Transformer for visual feature extraction and a token-packing module to enhance inference speed [18][19]. - The model employs a large language model (LLM) to generate scene descriptions, identify key objects, predict future states, and make driving decisions in a reasoning chain manner [19]. Group 4: Experimental Results - Experiments conducted on the NuScenes-S dataset, which contains 102,000 question-answer pairs, demonstrated that FastDrive achieved competitive performance in scene understanding tasks [21]. - The performance metrics for FastDrive showed strong results in perception, prediction, and decision-making tasks, outperforming other models [25].