Workflow
端到端自动驾驶
icon
Search documents
同济大学最新!GEMINUS:端到端MoE实现闭环新SOTA,性能大涨近8%~
自动驾驶之心· 2025-07-22 12:46
Core Viewpoint - The article presents GEMINUS, a novel end-to-end autonomous driving framework that integrates a dual-aware mixture of experts (MoE) architecture, achieving state-of-the-art performance in driving score and success rate using monocular vision input [1][2][49]. Summary by Sections Introduction - GEMINUS addresses the limitations of traditional single-modal planning methods in autonomous driving by introducing a framework that combines a global expert and a scene-adaptive experts group, along with a dual-aware router to enhance adaptability and robustness in diverse driving scenarios [1][6]. Background - The article discusses the evolution of end-to-end autonomous driving systems, highlighting the shift from modular approaches to unified models that directly map sensor inputs to control signals, thus reducing engineering workload and leveraging rich sensor information [4][8]. MoE Architecture - The MoE architecture has shown promise in handling complex data distributions, providing fine-grained scene adaptability and specialized behavior generation, which helps mitigate the mode averaging problem prevalent in existing models [5][11]. GEMINUS Framework - GEMINUS consists of a global expert trained on the overall dataset for robust performance and scene-adaptive experts trained on specific scene subsets for adaptability. The dual-aware router dynamically activates the appropriate expert based on scene features and routing uncertainty [6][18]. Experimental Results - GEMINUS outperformed existing methods in the Bench2Drive closed-loop benchmark, achieving a driving score improvement of 7.67% and a success rate increase of 22.06% compared to the original single-expert baseline model [2][36][49]. Ablation Studies - The ablation studies revealed that the scene-aware routing mechanism significantly enhances model performance, while the integration of uncertainty-aware routing and global experts further improves robustness and stability in ambiguous scenarios [40][41]. Conclusion - GEMINUS demonstrates a significant advancement in end-to-end autonomous driving, achieving state-of-the-art performance with monocular vision input and highlighting the importance of tailored MoE frameworks to address the complexities of real-world driving scenarios [49][50].
可以留意一下10位业内人士如何看VLA
理想TOP2· 2025-07-21 14:36
Core Viewpoints - The current development of cutting-edge technologies in autonomous driving is not yet fully mature for mass production, with significant challenges remaining to be addressed [1][27][31] - Emerging technologies such as VLA/VLM, diffusion models, closed-loop simulation, and reinforcement learning are seen as potential key directions for future exploration in autonomous driving [6][7][28] - The choice between deepening expertise in autonomous driving or transitioning to embodied intelligence depends on individual circumstances and market dynamics [19][34] Group 1: Current Technology Maturity - The BEV (Bird's Eye View) perception model has reached a level of maturity suitable for mass production, while other models like E2E (End-to-End) are still in the experimental phase [16][31] - There is a consensus that the existing models struggle with corner cases, particularly in complex driving scenarios, indicating that while basic functionalities are in place, advanced capabilities are still lacking [16][24][31] - The industry is witnessing a shift towards utilizing larger models and advanced techniques to enhance scene understanding and decision-making processes in autonomous vehicles [26][28] Group 2: Emerging Technologies - VLA/VLM is viewed as a promising direction for the next generation of autonomous driving, with the potential to improve reasoning capabilities and safety [2][28] - The application of reinforcement learning is recognized as having significant potential, particularly when combined with effective simulation environments [6][32] - Diffusion models are being explored for their ability to generate multi-modal trajectories, which could be beneficial in uncertain driving conditions [7][26] Group 3: Future Directions - Future advancements in autonomous driving technology are expected to focus on enhancing safety, improving passenger experience, and achieving comprehensive scene coverage [20][28] - The integration of closed-loop simulations and data-driven approaches is essential for refining autonomous driving systems and ensuring their reliability [20][30] - The industry is moving towards a data-driven model where the efficiency of data collection, cleaning, labeling, training, and validation will determine competitive advantage [20][22] Group 4: Career Choices - The decision to specialize in autonomous driving or shift to embodied intelligence should consider personal interests, market trends, and the maturity of each field [19][34] - The autonomous driving sector is perceived as having more immediate opportunities for impactful work compared to the still-developing field of embodied intelligence [19][34]
70K?端到端VLA现在这么吃香!?
自动驾驶之心· 2025-07-21 11:18
Core Viewpoint - End-to-end (E2E) autonomous driving is currently the core algorithm for mass production in intelligent driving, with significant advancements in the VLA (Vision-Language Architecture) and VLM (Vision-Language Model) systems, leading to high demand for related positions in the industry [2][4]. Summary by Sections Section 1: Background Knowledge - The course aims to provide a comprehensive understanding of end-to-end autonomous driving, including its historical development and the transition from modular to end-to-end approaches [21]. - Key technical stacks such as VLA, diffusion models, and reinforcement learning are essential for understanding the current landscape of autonomous driving technology [22]. Section 2: Job Market Insights - Positions related to VLA/VLM algorithms offer lucrative salaries, with 3-5 years of experience earning between 40K to 70K monthly, and top talents in the field can earn up to 1 million annually [10]. - The demand for VLA-related roles is increasing, indicating a shift in the industry towards advanced model architectures [9]. Section 3: Course Structure - The course is structured into five chapters, covering topics from basic concepts of end-to-end algorithms to advanced applications in VLA and reinforcement learning [19][30]. - Practical components are included to bridge the gap between theory and application, ensuring participants can implement learned concepts in real-world scenarios [18]. Section 4: Technical Innovations - Various approaches within end-to-end frameworks are explored, including two-stage and one-stage methods, with notable models like PLUTO and UniAD leading the way [4][23]. - The introduction of diffusion models has revolutionized trajectory prediction, allowing for better adaptability in uncertain driving environments [24]. Section 5: Learning Outcomes - Participants are expected to achieve a level of proficiency equivalent to one year of experience as an end-to-end autonomous driving algorithm engineer, mastering key technologies and frameworks [32]. - The course emphasizes the importance of understanding BEV perception, multimodal models, and reinforcement learning to stay competitive in the evolving job market [32].
还不知道研究方向?别人已经在卷VLA了......
自动驾驶之心· 2025-07-21 05:18
Core Viewpoint - The article emphasizes the shift in academic research from traditional perception and planning tasks in autonomous driving to the exploration of Vision-Language-Action (VLA) models, which present new opportunities for innovation and research in the field [1][2]. Group 1: VLA Research Topics - The VLA model aims to create an end-to-end autonomous driving system that maps raw sensor inputs directly to driving control commands, moving away from traditional modular architectures [2]. - The evolution of autonomous driving technology can be categorized into three phases: traditional modular architecture, pure visual end-to-end systems, and the emergence of VLA models [2][3]. - VLA models enhance interpretability and reliability by allowing the system to explain its decision-making process in natural language, thus improving human trust [3]. Group 2: Course Objectives and Structure - The course aims to help participants systematically master key theoretical knowledge in VLA and develop practical skills in model design and implementation [6][7]. - It includes a structured learning experience with a combination of online group research, paper guidance, and maintenance periods to ensure comprehensive understanding and application [6][8]. - Participants will gain insights into classic and cutting-edge papers, coding practices, and effective writing and submission strategies for academic papers [6][12]. Group 3: Enrollment and Requirements - The course is limited to 6-8 participants per session, targeting individuals with a foundational understanding of deep learning and autonomous driving algorithms [5][9]. - Basic requirements include familiarity with Python and PyTorch, as well as access to high-performance computing resources [13][14]. - The course emphasizes academic integrity and provides a structured environment for learning and research [14][19]. Group 4: Course Highlights - The program features a "2+1" teaching model with experienced instructors providing comprehensive support throughout the learning process [14]. - It is designed to ensure high academic standards and facilitate significant project outcomes, including a draft paper and project completion certificate [14][20]. - The course also includes a feedback mechanism to optimize the learning experience based on individual progress [14].
面试了很多端到端候选人,还是有很多人搞不清楚。。。
自动驾驶之心· 2025-07-20 08:36
Core Viewpoint - End-to-End Autonomous Driving is a key algorithm for intelligent driving mass production, with significant salary potential for related positions, and it has evolved into various technical directions since the introduction of UniAD [2][4]. Group 1: Technical Directions - End-to-End Autonomous Driving can be categorized into one-stage and two-stage approaches, with various subfields emerging under each category [2][4]. - The core advantage of end-to-end systems is the direct modeling from sensor input to vehicle planning/control information, avoiding error accumulation seen in modular methods [2]. - Notable algorithms include PLUTO for two-stage end-to-end, UniAD for perception-based one-stage, OccWorld for world model-based one-stage, and DiffusionDrive for diffusion model-based one-stage [4]. Group 2: Industry Trends - The demand for VLA/VLM algorithm experts is increasing, with salary ranges for positions requiring 3-5 years of experience being between 40K-70K [9]. - The industry is witnessing a shift towards large model algorithms, with companies focusing on VLA as the next generation of autonomous driving solutions [8][9]. Group 3: Course Offerings - A new course titled "End-to-End and VLA Autonomous Driving" is being offered to help individuals understand the complexities of end-to-end algorithms and their applications [15][28]. - The course covers various topics, including background knowledge, two-stage end-to-end, one-stage end-to-end, and practical applications of reinforcement learning [20][22][24]. - The course aims to provide a comprehensive understanding of the end-to-end framework, including key technologies like BEV perception, multi-modal large models, and diffusion models [31].
港中文最新!ReAL-AD:迈向类人推理的端到端自动驾驶,轨迹性能提升30%(ICCV'25)
自动驾驶之心· 2025-07-20 08:36
Core Insights - The article discusses the introduction of ReAL-AD, a reasoning-enhanced learning framework for end-to-end autonomous driving, which aims to align decision-making processes with human cognitive models [2][8][40]. Group 1: Framework Overview - ReAL-AD integrates a three-layer human cognitive model (driving strategy, driving decision, and driving operation) into the decision-making process of autonomous driving [2][8]. - The framework includes three main components: 1. Strategic Reasoning Injector, which formulates high-level driving strategies from complex traffic insights generated by visual-language models (VLMs) [8][20]. 2. Tactical Reasoning Integrator, which refines driving intentions into interpretable driving choices [8][20]. 3. Hierarchical Trajectory Decoder, which translates driving decisions into precise control actions for smooth and human-like trajectory execution [8][20]. Group 2: Performance Evaluation - Extensive evaluations on the NuScenes and Bench2Drive datasets demonstrate that ReAL-AD improves planning accuracy and safety by over 30% compared to baseline methods [9][34]. - The method reduces L2 error by 33% and collision rates by 32%, indicating significant enhancements in trajectory accuracy and driving safety [9][34]. Group 3: Comparison with Existing Methods - Existing end-to-end autonomous driving methods often rely on fixed and sparse trajectory supervision, which limits their ability to replicate the structured cognitive reasoning processes of human drivers [3][10]. - ReAL-AD addresses these limitations by embedding structured multi-stage reasoning into the decision-making hierarchy, enhancing generalization capabilities in diverse real-world scenarios [5][10]. Group 4: Experimental Results - The framework outperforms other state-of-the-art methods, achieving the lowest average L2 error of 0.48 meters and a collision rate of 0.15% on the NuScenes dataset [34]. - In closed-loop evaluations, the integration of ReAL-AD significantly improves driving scores and success rates, demonstrating its effectiveness in real-world applications [34].
死磕技术的自动驾驶黄埔军校,三周年了。。。
自动驾驶之心· 2025-07-19 03:04
Core Insights - The article emphasizes the transition of autonomous driving technology from Level 2/3 (assisted driving) to Level 4/5 (fully autonomous driving) by 2025, highlighting the competitive landscape in AI, particularly in autonomous driving, embodied intelligence, and large model agents [2][4]. Group 1: Autonomous Driving Community - The "Autonomous Driving Heart Knowledge Planet" is established as the largest community for autonomous driving technology in China, aiming to serve as a training ground for industry professionals [4][6]. - The community has nearly 4,000 members and over 100 industry experts, providing a platform for discussions, learning routes, and job referrals [4][6]. - The community focuses on various subfields of autonomous driving, including end-to-end driving, world models, and multi-sensor fusion, among others [4][6]. Group 2: Learning Modules and Resources - The knowledge community includes four main technical areas: visual large language models, world models, diffusion models, and end-to-end autonomous driving [6][7]. - It offers a comprehensive collection of resources, including cutting-edge articles, datasets, and application summaries relevant to the autonomous driving sector [6][7]. Group 3: Job Opportunities and Networking - The community has established direct referral channels with numerous autonomous driving companies, facilitating job placements for members [4][6]. - Active participation is encouraged, with a focus on fostering a collaborative environment for both newcomers and experienced professionals [4][6]. Group 4: Technical Insights - The article outlines various learning paths and technical insights into autonomous driving, emphasizing the importance of understanding perception, mapping, planning, and control in the development of autonomous systems [4][6][24]. - It highlights the significance of large language models and their integration into autonomous driving applications, enhancing decision-making and navigation capabilities [25][26].
端到端VLA这薪资,让我心动了。。。
自动驾驶之心· 2025-07-17 11:10
Core Viewpoint - End-to-End Autonomous Driving (E2E) is identified as the core algorithm for intelligent driving mass production, marking a significant shift in the industry towards more integrated and efficient systems [2][4]. Group 1: Technology Overview - E2E can be categorized into single-stage and two-stage approaches, with the latter gaining traction following the recognition of UniAD at CVPR [2]. - The E2E system directly models the relationship between sensor inputs and vehicle control information, minimizing errors associated with modular approaches [2]. - The introduction of BEV perception has bridged gaps between modular methods, leading to a technological leap in the field [2]. Group 2: Challenges in Learning - The rapid development of E2E technology has made previous educational resources outdated, creating a need for updated learning materials [5]. - The fragmented nature of knowledge across various domains complicates the learning process for newcomers, often leading to abandonment before mastery [5]. - A lack of high-quality documentation in E2E research increases the difficulty of entry into the field [5]. Group 3: Course Development - A new course titled "End-to-End and VLA Autonomous Driving" has been developed to address the challenges faced by learners [6]. - The course aims to provide a quick entry into core technologies using accessible language and examples, facilitating easier expansion into specific knowledge areas [6]. - It focuses on building a framework for understanding E2E research and enhancing research capabilities by categorizing papers and extracting innovative points [7]. Group 4: Course Structure - The course is structured into several chapters, covering topics from the history and evolution of E2E algorithms to practical applications and advanced techniques [11][12][20]. - Key areas of focus include the introduction of E2E algorithms, background knowledge on relevant technologies, and detailed explorations of both single-stage and two-stage methods [11][12][20]. - Practical components are integrated into the curriculum to ensure a comprehensive understanding of theoretical concepts [8]. Group 5: Expected Outcomes - Participants are expected to achieve a level of proficiency equivalent to one year of experience as an E2E autonomous driving algorithm engineer [27]. - The course will cover a wide range of methodologies, including single-stage, two-stage, world models, and diffusion models, providing a holistic view of the E2E landscape [27]. - A deeper understanding of key technologies such as BEV perception, multimodal large models, and reinforcement learning will be developed [27].
入职小米两个月了,还没摸过算法代码。。。
自动驾驶之心· 2025-07-16 08:46
Core Viewpoint - The article discusses the current trends and opportunities in the autonomous driving industry, emphasizing the importance of skill development and networking for job seekers in this field [4][7][8]. Group 1: Job Market Insights - The article highlights the challenges faced by recent graduates in aligning their job roles with their expectations, particularly in the context of internships and entry-level positions [2][4]. - It suggests that candidates should focus on relevant experiences, even if their current roles do not directly align with their career goals, and emphasizes the importance of showcasing all relevant skills on resumes [6][7]. Group 2: Skill Development and Learning Resources - The article encourages individuals to continue developing skills in autonomous driving, particularly in areas like large models and data processing, which are currently in demand [6][8]. - It mentions the availability of various resources, including online courses and community support, to help individuals enhance their knowledge and skills in the autonomous driving sector [8][10]. Group 3: Community and Networking - The article promotes joining communities focused on autonomous driving and embodied intelligence, which can provide valuable networking opportunities and access to industry insights [8][10]. - It emphasizes the importance of collaboration and knowledge sharing within these communities to stay updated on the latest trends and technologies in the field [8][10].
一文尽览!近一年自动驾驶VLA优秀工作汇总~
自动驾驶之心· 2025-07-15 12:30
Core Insights - The article discusses the advancements in Vision-Language-Action (VLA) models for autonomous driving, highlighting the integration of navigation and reinforcement learning to enhance reasoning capabilities beyond visual range [2][3][6]. Group 1: NavigScene - NavigScene is introduced as a novel auxiliary dataset that pairs local multi-view sensor inputs with global natural language navigation guidance, addressing the critical gap between local perception and global navigation context in autonomous driving [6]. - Three complementary paradigms are implemented in NavigScene: navigation-guided reasoning, navigation-guided preference optimization, and navigation-guided VLA models, enhancing the reasoning and generalization capabilities of autonomous driving systems [6]. - Comprehensive experiments demonstrate significant performance improvements in perception, prediction, and planning tasks by integrating global navigation knowledge into autonomous driving systems [6]. Group 2: AutoVLA - AutoVLA is proposed as an end-to-end autonomous driving framework that integrates physical action tokens with a pre-trained VLM backbone, enabling direct policy learning and semantic reasoning from raw visual observations and language instructions [12]. - A reinforcement learning-based post-training method using Group Relative Policy Optimization (GRPO) is introduced to achieve adaptive reasoning and further enhance model performance in end-to-end driving tasks [12]. - AutoVLA achieves competitive performance across multiple autonomous driving benchmarks, including open-loop and closed-loop tests [12]. Group 3: ReCogDrive - ReCogDrive is presented as an end-to-end autonomous driving system that integrates VLM with a diffusion planner, employing a three-stage training paradigm to address performance drops in rare and long-tail scenarios [13][16]. - The first stage involves fine-tuning the VLM on a large-scale driving Q&A dataset to mitigate domain gaps between general content and real-world driving scenarios [16]. - The method achieves a state-of-the-art PDMS score of 89.6 on the NAVSIM benchmark, highlighting its effectiveness and feasibility [16]. Group 4: Impromptu VLA - Impromptu VLA introduces a large-scale, richly annotated dataset aimed at addressing the limitations of existing benchmarks in autonomous driving VLA models [22]. - The dataset is designed to enhance the performance of VLA models in unstructured extreme scenarios, demonstrating significant improvements in established benchmarks [22]. - Experiments show that training with the Impromptu VLA dataset leads to notable performance enhancements in closed-loop NeuroNCAP scores and collision rates [22]. Group 5: DriveMoE - DriveMoE is a novel end-to-end autonomous driving framework that incorporates a mixture-of-experts (MoE) architecture to effectively handle multi-view sensor data and complex driving scenarios [28]. - The framework features scene-specific visual MoE and skill-specific action MoE, addressing the challenges of multi-view redundancy and skill specialization [28]. - DriveMoE achieves state-of-the-art performance in closed-loop evaluations on the Bench2Drive benchmark, demonstrating the effectiveness of combining visual and action MoE in autonomous driving tasks [28].