OpenEMMA
Search documents
刷完了端到端和VLA新工作,这9个开源项目最值得复现......
自动驾驶之心· 2026-01-10 03:47
Core Viewpoint - The article highlights the rapid growth of open-source projects in the field of autonomous driving, particularly those expected to be valuable in 2025. It emphasizes the importance of these projects in providing comprehensive solutions for end-to-end autonomous driving, from data cleaning to evaluation, and encourages developers to engage with these resources for practical learning and application [4][5]. Summary by Relevant Sections DiffusionDrive - Developed by Huazhong University of Science and Technology and Horizon, DiffusionDrive addresses the conflict between diversity generation and real-time inference in end-to-end autonomous driving planning. It introduces a solution that simplifies traditional multi-step denoising to just 2-4 steps while maintaining action distribution diversity and achieving real-time performance of 45 FPS on a 4090 GPU. The model has demonstrated high planning quality with a PDMS score of 88.1 on the NAVSIM benchmark [8]. OpenEMMA - OpenEMMA, created by Texas A&M University, University of Michigan, and University of Toronto, proposes a lightweight and generalizable framework to tackle the high training costs and deployment difficulties of multimodal large language models (MLLM) in autonomous driving. It employs a Chain-of-Thought reasoning mechanism to enhance the model's generalization and reliability in complex scenarios without the need for extensive retraining [11]. Diffusion-Planner - This project, involving Tsinghua University and several other institutions, presents a Transformer-based diffusion planning model that generates multimodal trajectories from noise, addressing the average solution dilemma in imitation learning. It integrates trajectory prediction and vehicle planning into a unified architecture, achieving leading performance on the nuPlan benchmark [14]. UniScene - UniScene, developed by Shanghai Jiao Tong University and others, introduces a multimodal generation framework to reduce the high costs of obtaining high-quality data for autonomous driving. It employs a layered generation approach to create occupancy maps and corresponding multimodal data, significantly improving the quality of generated data for downstream tasks [16]. ORION - ORION, from Huazhong University of Science and Technology and Xiaomi, tackles the disconnection between causal reasoning and trajectory generation in end-to-end autonomous driving. It utilizes a unified framework to align visual, reasoning, and action spaces, leading to improved driving scores and success rates in evaluations [18]. FSDrive - FSDrive, developed by Xi'an Jiaotong University and others, addresses the issue of visual detail loss in end-to-end driving planning caused by reliance on pure textual reasoning. It proposes a visual reasoning paradigm that enhances trajectory accuracy and safety while maintaining strong scene understanding capabilities [21]. AutoVLA - AutoVLA, from UCLA, presents a unified autoregressive generative framework that ensures the physical feasibility of actions in driving models. It allows for adaptive reasoning based on scene complexity and has shown competitive performance across various benchmarks [24]. OpenDriveVLA - OpenDriveVLA, created by Technical University of Munich and others, is an end-to-end driving VLA model that integrates multimodal inputs to output driving actions. It effectively bridges the semantic gap between visual and language modalities, demonstrating its effectiveness in open-loop planning and driving Q&A tasks [26]. SimLingo - SimLingo addresses the common disconnect between language models and driving behavior in autonomous driving. It proposes a multi-task joint training framework that aligns driving behavior, visual language understanding, and language-action consistency, achieving leading performance in evaluations [29]. Conclusion - The article encourages developers to utilize these repositories as engineering building blocks, suggesting that practical engagement with the code and demos can significantly enhance understanding of autonomous driving technology [31].
自动驾驶端到端VLA落地,算法如何设计?
自动驾驶之心· 2025-06-22 14:09
Core Insights - The article discusses the rapid advancements in end-to-end autonomous driving, particularly focusing on Vision-Language-Action (VLA) models and their applications in the industry [2][3]. Group 1: VLA Model Developments - The introduction of AutoVLA, a new VLA model that integrates reasoning and action generation for end-to-end autonomous driving, shows promising results in semantic reasoning and trajectory planning [3][4]. - ReCogDrive, another VLA model, addresses performance issues in rare and long-tail scenarios by utilizing a three-stage training framework that combines visual language models with diffusion planners [7][9]. - Impromptu VLA introduces a dataset aimed at improving VLA models' performance in unstructured extreme conditions, demonstrating significant performance improvements in established benchmarks [14][24]. Group 2: Experimental Results - AutoVLA achieved competitive performance metrics in various scenarios, with the best-of-N method reaching a PDMS score of 92.12, indicating its effectiveness in planning and execution [5]. - ReCogDrive set a new state-of-the-art PDMS score of 89.6 on the NAVSIM benchmark, showcasing its robustness and safety in driving trajectories [9][10]. - The OpenDriveVLA model demonstrated superior results in open-loop trajectory planning and driving-related question-answering tasks, outperforming previous methods on the nuScenes dataset [28][32]. Group 3: Industry Trends - The article highlights a trend among major automotive manufacturers, such as Li Auto, Xiaomi, and XPeng, to invest heavily in VLA model research and development, indicating a competitive landscape in autonomous driving technology [2][3]. - The integration of large language models (LLMs) with VLA frameworks is becoming a focal point for enhancing decision-making capabilities in autonomous vehicles, as seen in models like ORION and VLM-RL [33][39].