DiffusionDrive
Search documents
摸底地平线HSD一段式端到端的方案设计
自动驾驶之心· 2026-01-13 10:14
Core Insights - The article discusses two core papers from Horizon Robotics: DiffusionDrive and ResAD, focusing on their contributions to end-to-end autonomous driving solutions [2][3]. DiffusionDrive - The overall architecture of DiffusionDrive consists of three parts: perception information, navigation information, and trajectory generation [6]. - Perception information includes dynamic/static obstacles, traffic lights, map elements, and drivable areas, emphasizing the need to convey perception tasks to planning tasks in an end-to-end manner [6]. - Navigation information is crucial for avoiding incorrect routes, especially in complex urban environments like Shanghai, where navigation challenges are significant [7]. - The core of trajectory generation is "Truncated Diffusion," which leverages fixed patterns in human driving behavior to reduce training convergence difficulty and inference noise [8]. - The article outlines a method for trajectory generation using K-Means clustering to describe common human driving behaviors, which simplifies the training process [9]. - The anchor-based trajectory generation approach reduces training difficulty and enhances real-time inference capabilities, although concerns about trajectory stability over time are raised [10]. ResAD - ResAD introduces a residual design that predicts the difference between future trajectories and their inertial extrapolations, rather than generating future trajectories directly [12]. - The residual regularization is essential for managing the increasing residuals over time, ensuring that the model focuses on the true diversity of driving behaviors [13][14]. - The design allows for different noise perturbations in the trajectory generation process, adjusting learning difficulty based on the noise applied in lateral and longitudinal directions [15]. - ResAD features a trajectory ranker that utilizes a transformer model to predict metric scores based on top-k trajectory predictions and environmental information [16]. - The regularized residual supervision effectively separates the inertial component from predictions, addressing data imbalance issues in training [17]. Conclusion - Both papers from Horizon Robotics provide valuable insights and methodologies for enhancing autonomous driving technology, encouraging further exploration and development in the field [18].
刷完了端到端和VLA新工作,这9个开源项目最值得复现......
自动驾驶之心· 2026-01-10 03:47
Core Viewpoint - The article highlights the rapid growth of open-source projects in the field of autonomous driving, particularly those expected to be valuable in 2025. It emphasizes the importance of these projects in providing comprehensive solutions for end-to-end autonomous driving, from data cleaning to evaluation, and encourages developers to engage with these resources for practical learning and application [4][5]. Summary by Relevant Sections DiffusionDrive - Developed by Huazhong University of Science and Technology and Horizon, DiffusionDrive addresses the conflict between diversity generation and real-time inference in end-to-end autonomous driving planning. It introduces a solution that simplifies traditional multi-step denoising to just 2-4 steps while maintaining action distribution diversity and achieving real-time performance of 45 FPS on a 4090 GPU. The model has demonstrated high planning quality with a PDMS score of 88.1 on the NAVSIM benchmark [8]. OpenEMMA - OpenEMMA, created by Texas A&M University, University of Michigan, and University of Toronto, proposes a lightweight and generalizable framework to tackle the high training costs and deployment difficulties of multimodal large language models (MLLM) in autonomous driving. It employs a Chain-of-Thought reasoning mechanism to enhance the model's generalization and reliability in complex scenarios without the need for extensive retraining [11]. Diffusion-Planner - This project, involving Tsinghua University and several other institutions, presents a Transformer-based diffusion planning model that generates multimodal trajectories from noise, addressing the average solution dilemma in imitation learning. It integrates trajectory prediction and vehicle planning into a unified architecture, achieving leading performance on the nuPlan benchmark [14]. UniScene - UniScene, developed by Shanghai Jiao Tong University and others, introduces a multimodal generation framework to reduce the high costs of obtaining high-quality data for autonomous driving. It employs a layered generation approach to create occupancy maps and corresponding multimodal data, significantly improving the quality of generated data for downstream tasks [16]. ORION - ORION, from Huazhong University of Science and Technology and Xiaomi, tackles the disconnection between causal reasoning and trajectory generation in end-to-end autonomous driving. It utilizes a unified framework to align visual, reasoning, and action spaces, leading to improved driving scores and success rates in evaluations [18]. FSDrive - FSDrive, developed by Xi'an Jiaotong University and others, addresses the issue of visual detail loss in end-to-end driving planning caused by reliance on pure textual reasoning. It proposes a visual reasoning paradigm that enhances trajectory accuracy and safety while maintaining strong scene understanding capabilities [21]. AutoVLA - AutoVLA, from UCLA, presents a unified autoregressive generative framework that ensures the physical feasibility of actions in driving models. It allows for adaptive reasoning based on scene complexity and has shown competitive performance across various benchmarks [24]. OpenDriveVLA - OpenDriveVLA, created by Technical University of Munich and others, is an end-to-end driving VLA model that integrates multimodal inputs to output driving actions. It effectively bridges the semantic gap between visual and language modalities, demonstrating its effectiveness in open-loop planning and driving Q&A tasks [26]. SimLingo - SimLingo addresses the common disconnect between language models and driving behavior in autonomous driving. It proposes a multi-task joint training framework that aligns driving behavior, visual language understanding, and language-action consistency, achieving leading performance in evaluations [29]. Conclusion - The article encourages developers to utilize these repositories as engineering building blocks, suggesting that practical engagement with the code and demos can significantly enhance understanding of autonomous driving technology [31].
摸底地平线HSD一段式端到端的方案设计
自动驾驶之心· 2025-12-30 00:28
Core Insights - The article discusses two core papers from Horizon Robotics: DiffusionDrive and ResAD, focusing on their contributions to end-to-end autonomous driving solutions [2][3]. DiffusionDrive - The overall architecture of DiffusionDrive consists of three parts: perception information, navigation information, and trajectory generation [6]. - Perception information includes dynamic/static obstacles, traffic lights, map elements, and drivable areas, emphasizing the need to convey perception tasks to planning tasks in an end-to-end manner [6]. - Navigation information is crucial for avoiding incorrect routes, especially in complex urban environments like Shanghai, where navigation challenges are significant [7]. - The core concept of trajectory generation is "Truncated Diffusion," which leverages fixed patterns in human driving behavior to reduce training convergence difficulty and inference noise [8][10]. - The article outlines a method for trajectory generation using K-Means clustering to describe common human driving behaviors, which simplifies the training process [9]. ResAD - ResAD introduces a residual design that predicts the difference between future trajectories and inertial extrapolated trajectories, rather than generating future trajectories directly [12]. - The residual regularization helps manage the increasing residuals over time, ensuring that the model focuses on the true diversity of driving behaviors [13][14]. - The design allows for different noise perturbations in the trajectory generation process, adjusting learning difficulty based on the direction of motion [15]. - ResAD also features a trajectory ranker that utilizes a transformer model to predict metric scores based on top-k trajectory predictions and environmental information [16]. Conclusion - Both papers from Horizon Robotics provide valuable insights and methodologies for enhancing autonomous driving systems, encouraging further exploration and development in the field [18].
DiffusionDriveV2核心代码解析
自动驾驶之心· 2025-12-28 09:23
Core Viewpoint - The article discusses the DiffusionDrive model, which utilizes a truncated diffusion approach for end-to-end autonomous driving, emphasizing its architecture and the integration of reinforcement learning to enhance trajectory planning and safety [1]. Group 1: Model Architecture - DiffusionDriveV2 employs a reinforcement learning-constrained truncated diffusion model, focusing on the overall architecture for autonomous driving [3]. - The model incorporates environment encoding, including bird's-eye view (BEV) features and vehicle status, to enhance the understanding of the driving context [5]. - The trajectory planning module utilizes multi-scale BEV features to improve the accuracy of trajectory predictions [8]. Group 2: Trajectory Generation - The model generates trajectories by first clustering the true future trajectories of the vehicle using K-Means to create anchors, which are then perturbed with Gaussian noise [12]. - The trajectory prediction process involves cross-attention mechanisms between the trajectory features and BEV features, allowing for more accurate trajectory generation [15][17]. - The model also integrates time encoding to enhance the temporal aspect of trajectory predictions [14]. Group 3: Reinforcement Learning Integration - The Intra-Anchor GRPO method is proposed to optimize strategies within specific behavior intentions, enhancing safety and goal-oriented trajectory generation [27]. - The reinforcement learning loss function is designed to mitigate instability during early denoising steps, using a discount factor to adjust the influence of rewards over time [28]. - The model incorporates a clear learning signal by truncating negative advantages and applying strong penalties for collisions, ensuring safer trajectory outputs [30]. Group 4: Noise Management - The model introduces multiplicative noise rather than additive noise to maintain the structural integrity of trajectories, ensuring smoother exploration paths [33]. - This approach addresses the inherent scale inconsistencies in trajectory segments, allowing for more coherent and realistic trajectory generation [35]. Group 5: Evaluation Metrics - The model evaluates generated trajectories based on safety, comfort, rule compliance, progress, and feasibility, aggregating these into a comprehensive score [27]. - Specific metrics are employed to assess safety (collision detection), comfort (acceleration and curvature), and adherence to traffic rules, ensuring a holistic evaluation of trajectory performance [27].
时隔一年DiffusionDrive升级到v2,创下了新纪录!
自动驾驶之心· 2025-12-11 03:35
Core Insights - The article discusses the upgrade of DiffusionDrive to version 2, highlighting its advancements in end-to-end autonomous driving trajectory planning through the integration of reinforcement learning to address the challenges of diversity and sustained high quality in trajectory generation [1][3][10]. Background Review - The shift towards end-to-end autonomous driving (E2E-AD) has emerged as traditional tasks like 3D object detection and motion prediction have matured. Early methods faced limitations in modeling, often generating single trajectories without alternatives in complex driving scenarios [5][10]. - Previous diffusion models applied to trajectory generation struggled with mode collapse, leading to a lack of diversity in generated behaviors. DiffusionDrive introduced a Gaussian Mixture Model (GMM) to define prior distributions for initial noise, promoting diverse behavior generation [5][13]. Methodology - DiffusionDriveV2 introduces a novel framework that utilizes reinforcement learning to overcome the limitations of imitation learning, which previously led to a trade-off between diversity and sustained high quality in trajectory generation [10][12]. - The framework incorporates intra-anchor GRPO and inter-anchor truncated GRPO to manage advantage estimation within specific driving intentions, preventing mode collapse by avoiding inappropriate comparisons between different intentions [9][12][28]. - The method employs scale-adaptive multiplicative noise to enhance exploration while maintaining trajectory smoothness, addressing the inherent scale inconsistency between proximal and distal segments of trajectories [24][39]. Experimental Results - Evaluations on the NAVSIM v1 and NAVSIM v2 datasets demonstrated that DiffusionDriveV2 achieved state-of-the-art performance, with a PDMS score of 91.2 on NAVSIM v1 and 85.5 on NAVSIM v2, significantly outperforming previous models [10][33]. - The results indicate that DiffusionDriveV2 effectively balances trajectory diversity and sustained quality, achieving optimal performance in closed-loop evaluations [38][39]. Conclusion - The article concludes that DiffusionDriveV2 successfully addresses the inherent challenges of imitation learning in trajectory generation, achieving an optimal trade-off between planning quality and diversity through innovative reinforcement learning techniques [47].
全面超越DiffusionDrive, GMF-Drive:全球首个Mamba端到端SOTA方案
理想TOP2· 2025-08-18 12:43
Core Insights - The article discusses the advancements in end-to-end autonomous driving, emphasizing the importance of multi-modal fusion architectures and the introduction of GMF-Drive as a new framework that improves upon existing methods [3][4][44]. Group 1: End-to-End Autonomous Driving - End-to-end autonomous driving has gained widespread acceptance as it directly maps raw sensor inputs to driving actions, reducing reliance on intermediate representations and information loss [3]. - Recent models like DiffusionDrive and GoalFlow demonstrate strong capabilities in generating diverse and high-quality driving trajectories [3]. Group 2: Multi-Modal Fusion Challenges - A key bottleneck in current systems is the integration of heterogeneous inputs from different sensors, with existing methods often relying on simple feature concatenation rather than structured information integration [4][6]. - The article highlights that current multi-modal fusion architectures, such as TransFuser, show limited performance improvements compared to single-modal architectures, indicating a need for more sophisticated integration methods [6]. Group 3: GMF-Drive Overview - GMF-Drive, developed by teams from University of Science and Technology of China and China University of Mining and Technology, includes three modules aimed at enhancing multi-modal fusion for autonomous driving [7]. - The framework combines a gated Mamba fusion approach with spatial-aware BEV representation, addressing the limitations of traditional transformer-based methods [7][44]. Group 4: Innovations in Data Representation - The article introduces a 14-dimensional pillar representation that retains critical 3D geometric features, enhancing the model's perception capabilities [16][19]. - This representation captures local surface geometry and height variations, allowing the model to differentiate between objects with similar point densities but different structures [19]. Group 5: GM-Fusion Module - The GM-Fusion module integrates multi-modal features through gated channel attention, BEV-SSM, and hierarchical deformable cross-attention, achieving linear complexity while maintaining long-range dependency modeling [19][20]. - The module's design allows for effective spatial dependency modeling and improved feature alignment between camera and LiDAR data [19][40]. Group 6: Experimental Results - GMF-Drive achieved a PDMS score of 88.9 on the NAVSIM benchmark, outperforming the previous best model, DiffusionDrive, by 0.8 points, demonstrating the effectiveness of the GM-Fusion architecture [29][30]. - The framework also showed significant improvements in key sub-metrics, such as driving area compliance and vehicle progression rate, indicating enhanced safety and efficiency [30][31]. Group 7: Conclusion - The article concludes that GMF-Drive represents a significant advancement in autonomous driving frameworks by effectively combining geometric representations with spatially aware fusion techniques, achieving new performance benchmarks [44].
全面超越DiffusionDrive!中科大GMF-Drive:全球首个Mamba端到端SOTA方案
自动驾驶之心· 2025-08-13 23:33
Core Viewpoint - The article discusses the GMF-Drive framework developed by the University of Science and Technology of China, which addresses the limitations of existing multi-modal fusion architectures in end-to-end autonomous driving by integrating gated Mamba fusion with spatial-aware BEV representation [2][7]. Summary by Sections End-to-End Autonomous Driving - End-to-end autonomous driving has gained recognition as a viable solution, directly mapping raw sensor inputs to driving actions, thus minimizing reliance on intermediate representations and information loss [2]. - Recent models like DiffusionDrive and GoalFlow have demonstrated strong capabilities in generating diverse and high-quality driving trajectories [2][8]. Multi-Modal Fusion Challenges - A key bottleneck in current systems is the multi-modal fusion architecture, which struggles to effectively integrate heterogeneous inputs from different sensors [3]. - Existing methods, primarily based on the TransFuser style, often result in limited performance improvements, indicating a simplistic feature concatenation rather than structured information integration [5]. GMF-Drive Framework - GMF-Drive consists of three modules: a data preprocessing module that enhances geometric information, a perception module utilizing a spatial-aware state space model (SSM), and a trajectory planning module employing a truncated diffusion strategy [7][13]. - The framework aims to retain critical 3D geometric features while improving computational efficiency compared to traditional transformer-based methods [11][16]. Experimental Results - GMF-Drive achieved a PDMS score of 88.9 on the NAVSIM dataset, outperforming the previous best model, DiffusionDrive, by 0.8 points [32]. - The framework demonstrated significant improvements in key metrics, including a 1.1 point increase in the driving area compliance score (DAC) and a maximum score of 83.3 in the ego vehicle progression (EP) [32][34]. Component Analysis - The study conducted ablation experiments to assess the contributions of various components, confirming that the integration of geometric representations and the GM-Fusion architecture is crucial for optimal performance [39][40]. - The GM-Fusion module, which includes gated channel attention, BEV-SSM, and hierarchical deformable cross-attention, significantly enhances the model's ability to process multi-modal data effectively [22][44]. Conclusion - GMF-Drive represents a novel end-to-end autonomous driving framework that effectively combines geometric-enhanced pillar representation with a spatial-aware fusion model, achieving superior performance compared to existing transformer-based architectures [51].
可以留意一下10位业内人士如何看VLA
理想TOP2· 2025-07-21 14:36
Core Viewpoints - The current development of cutting-edge technologies in autonomous driving is not yet fully mature for mass production, with significant challenges remaining to be addressed [1][27][31] - Emerging technologies such as VLA/VLM, diffusion models, closed-loop simulation, and reinforcement learning are seen as potential key directions for future exploration in autonomous driving [6][7][28] - The choice between deepening expertise in autonomous driving or transitioning to embodied intelligence depends on individual circumstances and market dynamics [19][34] Group 1: Current Technology Maturity - The BEV (Bird's Eye View) perception model has reached a level of maturity suitable for mass production, while other models like E2E (End-to-End) are still in the experimental phase [16][31] - There is a consensus that the existing models struggle with corner cases, particularly in complex driving scenarios, indicating that while basic functionalities are in place, advanced capabilities are still lacking [16][24][31] - The industry is witnessing a shift towards utilizing larger models and advanced techniques to enhance scene understanding and decision-making processes in autonomous vehicles [26][28] Group 2: Emerging Technologies - VLA/VLM is viewed as a promising direction for the next generation of autonomous driving, with the potential to improve reasoning capabilities and safety [2][28] - The application of reinforcement learning is recognized as having significant potential, particularly when combined with effective simulation environments [6][32] - Diffusion models are being explored for their ability to generate multi-modal trajectories, which could be beneficial in uncertain driving conditions [7][26] Group 3: Future Directions - Future advancements in autonomous driving technology are expected to focus on enhancing safety, improving passenger experience, and achieving comprehensive scene coverage [20][28] - The integration of closed-loop simulations and data-driven approaches is essential for refining autonomous driving systems and ensuring their reliability [20][30] - The industry is moving towards a data-driven model where the efficiency of data collection, cleaning, labeling, training, and validation will determine competitive advantage [20][22] Group 4: Career Choices - The decision to specialize in autonomous driving or shift to embodied intelligence should consider personal interests, market trends, and the maturity of each field [19][34] - The autonomous driving sector is perceived as having more immediate opportunities for impactful work compared to the still-developing field of embodied intelligence [19][34]