Workflow
端到端自动驾驶
icon
Search documents
Waymo提出Drive&Gen:用生成视频评估端到端自动驾驶(IROS'25)
自动驾驶之心· 2025-10-12 23:33
作者 | Jiahao Wang 来源 | 我爱计算机视觉 传统的自动驾驶系统像一个部门林立的大公司,感知、预测、规划等模块各司其职,虽然稳定,但流程繁琐,一个环节出错就可能影响全局。而E2E模型就 像一个全能的创业团队,直接从摄像头画面等原始输入,一步到位输出驾驶决策,简洁高效,潜力巨大。 但问题也随之而来:AI生成的视频真的足够"真实",能骗过自动驾驶系统,并用来做严肃的评估吗?我们又该如何深入了解E2E驾驶模型的"脾气",修复它 的短板,让它在没见过的新场景(比如突然的暴雨天)里也能从容应对? 为了回答这些问题,来自约翰霍普金斯大学、Waymo和谷歌DeepMind的研究者们联手,在即将于IROS 2025会议上发表的论文中,提出了一个名为 Drive&Gen 的新框架。这个名字很直白,就是将 驾驶(Drive) 和 生成(Gen) 结合起来,旨在连接E2E驾驶模型和生成式世界模型,共同评估和提升彼 此。 背景:当E2E驾驶遇上生成式AI 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术 ...
工业界大佬带队!三个月搞定端到端自动驾驶
自动驾驶之心· 2025-10-12 23:33
Core Viewpoint - 2023 marks the year of end-to-end production, with 2024 expected to be a significant year for end-to-end production in the automotive industry, as leading new forces and manufacturers have already achieved end-to-end production [1][3]. Group 1: End-to-End Production Development - The automotive industry is witnessing rapid development in end-to-end production, particularly in one-stage and two-stage paradigms, with one-stage methods like UniAD being prominent [1][3]. - Various one-stage methods have emerged, including perception-based, world model-based, diffusion model-based, and VLA-based approaches, indicating a strong push from both autonomous driving companies and vehicle manufacturers towards self-research and mass production of end-to-end autonomous driving [3][5]. Group 2: Course Overview - A course titled "End-to-End and VLA Autonomous Driving" has been launched, focusing on cutting-edge algorithms in both one-stage and two-stage end-to-end methods, aimed at bridging academic and industrial advancements [5][15]. - The course is structured into several chapters, covering topics such as the history and evolution of end-to-end algorithms, background knowledge on VLA, and detailed discussions on two-stage and one-stage end-to-end methods [9][10][12]. Group 3: Key Technologies and Techniques - The course emphasizes key technologies such as BEV perception, visual language models (VLM), diffusion models, and reinforcement learning, which are essential for mastering the latest advancements in autonomous driving [5][11]. - The second chapter of the course is highlighted as crucial for understanding the most frequently asked technical keywords in job interviews over the next two years [10]. Group 4: Practical Applications and Outcomes - The course includes practical assignments, such as RLHF fine-tuning, allowing participants to apply their knowledge in real-world scenarios and understand how to build and experiment with reinforcement learning modules [13][19]. - By completing the course, participants are expected to reach a level equivalent to one year of experience as an end-to-end autonomous driving algorithm engineer, gaining a comprehensive understanding of various methodologies and their applications [19].
工业界和学术界大佬带队!彻底搞定端到端与VLA
自动驾驶之心· 2025-10-09 23:32
Core Insights - The article discusses the evolution of end-to-end algorithms in autonomous driving, highlighting the transition from modular production algorithms to end-to-end and now to Vision-Language Alignment (VLA) models [1][3] - It emphasizes the rich technology stack involved in end-to-end algorithms, including BEV perception, visual language models (VLM), diffusion models, reinforcement learning, and world models [3][10] Summary by Sections End-to-End Algorithms - End-to-end algorithms are categorized into two main paradigms: single-stage and two-stage, with UniAD being a representative of the single-stage approach [1] - Single-stage can further branch into various subfields, particularly those based on VLA, which have seen a surge in related publications and industrial applications in recent years [1] VLA and Course Offerings - The article mentions the launch of courses aimed at helping individuals quickly and efficiently learn about end-to-end and VLA in autonomous driving, featuring collaboration between industry and academia [3] - The "VLA and Large Model Practical Course" focuses on VLA, covering topics from VLM as an autonomous driving interpreter to modular and integrated VLA approaches [3] Course Structure and Faculty - The course structure includes a comprehensive overview of VLA, with detailed theoretical foundations in Vision, Language, and Action, as well as practical assignments to build VLA models and datasets from scratch [3][10] - The teaching team consists of experienced professionals from top academic institutions and industry, with backgrounds in multimodal perception, autonomous driving, and large model frameworks [7][9][10] Target Audience and Requirements - The courses are designed for individuals with a foundational understanding of autonomous driving and familiarity with key technologies such as transformer models, reinforcement learning, and BEV perception [13] - Participants are expected to have a basic knowledge of probability theory, linear algebra, and programming skills in Python and PyTorch [13]
模仿学习无法真正端到端?
自动驾驶之心· 2025-10-08 23:33
Core Viewpoint - The article emphasizes that in the autonomous driving industry, the training methods are more critical than model architectures like VLA or world models, highlighting the limitations of imitation learning in achieving true end-to-end autonomous driving [2][14]. Limitations of Imitation Learning - Imitation learning assumes that expert data is optimal, but in the context of driving, there is no single perfect driving behavior due to the diverse styles and strategies of human drivers [3][4]. - The training data lacks consistency and optimality, leading to models that learn vague and imprecise driving patterns rather than clear and logical strategies [3][4]. - Imitation learning fails to distinguish between critical decision-making scenarios and ordinary ones, resulting in models that may make fatal errors in crucial moments [5][6]. Key Scene Identification - The article discusses the importance of identifying key scenes in driving, where the model's output precision is critical, especially in complex scenarios [7][8]. - It introduces the concept of "advantage" from reinforcement learning, which helps define key states where optimal actions significantly outperform others [7]. Out-of-Distribution (OOD) Issues - Open-loop imitation learning can lead to cumulative errors, causing the model to enter states that differ from the training data distribution, resulting in performance degradation [8][10][12]. - The article illustrates that models trained purely on imitation learning may struggle in critical situations, such as timely lane changes, due to their reliance on suboptimal behaviors learned from human data [13]. Conclusion - The core of technological development lies in identifying key routes and bottlenecks rather than merely following trends, suggesting a need for new methods beyond imitation learning to address its limitations [14].
纵向端到端是自动驾驶技术的一道分水岭
自动驾驶之心· 2025-10-04 04:04
Core Insights - The article discusses the evolution of end-to-end autonomous driving technology, highlighting the shift from horizontal to vertical end-to-end systems as a new industry focus [2][3] - It emphasizes the importance of vertical end-to-end control for achieving human-like driving efficiency, particularly in speed and braking control [4][16] Group 1: Importance of Vertical End-to-End Control - Vertical end-to-end control is essential for achieving smooth acceleration and deceleration, which is a key differentiator between novice and experienced drivers [3][4] - The article defines "defensive deceleration" as the ability to adjust speed based on necessity and prediction, balancing safety and efficiency [4][12] - Current autonomous systems often prioritize navigation efficiency over vertical control, making it challenging to implement effective speed adjustments [15][16] Group 2: Challenges in Achieving Vertical End-to-End Control - Many autonomous driving systems have successfully implemented horizontal end-to-end control, but vertical control remains a significant challenge [13][16] - The noise in human driving data complicates the learning process for autonomous systems, making it difficult to distinguish meaningful speed control from random fluctuations [16][17] - Solutions to improve vertical control include data cleaning, causal reasoning, and reinforcement learning, which are being explored by leading autonomous driving teams [17]
有人在自驾里面盲目内卷,而有的人在搭建真正的壁垒...
自动驾驶之心· 2025-09-29 23:33
Core Viewpoint - The automotive industry is undergoing a significant transformation, with numerous executive changes and a focus on advanced technologies such as autonomous driving and artificial intelligence [1][3]. Group 1: Industry Changes - In September, 48 executives in the automotive sector underwent changes, indicating a shift in leadership and strategy [1]. - Companies like Li Auto and BYD are restructuring their teams to enhance their capabilities in autonomous driving and cockpit technology [1]. - The industry is witnessing a rapid evolution in algorithm development, moving from BEV to more complex models like VLA and world models [1][3]. Group 2: Autonomous Driving Focus - The forefront of autonomous driving technology is centered on VLA/VLM, end-to-end driving, world models, and reinforcement learning [3]. - There is a notable gap in understanding the industry's actual progress among students and mid-sized companies, highlighting the need for better communication between academia and industry [3]. Group 3: Community and Knowledge Sharing - A community called "Autonomous Driving Heart Knowledge Planet" has been established to bridge the gap between academic and industrial knowledge, aiming to grow to nearly 10,000 members in two years [5]. - The community offers a comprehensive platform for learning, including video content, Q&A, and job exchange, catering to both beginners and advanced learners [6][10]. - Members can access over 40 technical routes and engage with industry leaders to discuss trends and challenges in autonomous driving [6][8]. Group 4: Learning Resources - The community provides various resources for practical questions related to autonomous driving, such as entry points for end-to-end systems and data annotation practices [6][11]. - A detailed curriculum is available for newcomers, covering essential topics in autonomous driving technology [20][21]. - The platform also includes job referral mechanisms to connect members with potential employers in the autonomous driving sector [13][14].
工业界大佬带队!三个月搞定端到端自动驾驶
自动驾驶之心· 2025-09-29 08:45
Core Viewpoint - 2023 is identified as the year of end-to-end production, with 2024 expected to be a significant year for this development in the automotive industry, particularly in autonomous driving technology [1][3]. Group 1: End-to-End Production - Leading new forces and manufacturers have already achieved end-to-end production [1]. - There are two main paradigms in the industry: one-stage and two-stage approaches, with UniAD being a representative of the one-stage method [1]. Group 2: Development Trends - Since last year, the one-stage end-to-end approach has rapidly evolved, leading to various derivatives such as perception-based, world model-based, diffusion model-based, and VLA-based one-stage methods [3]. - Major autonomous driving companies are focusing on self-research and mass production of end-to-end autonomous driving solutions [3]. Group 3: Course Offerings - A course titled "End-to-End and VLA Autonomous Driving" has been launched, covering cutting-edge algorithms in both one-stage and two-stage end-to-end approaches [5]. - The course aims to provide insights into the latest technologies in the field, including BEV perception, visual language models, diffusion models, and reinforcement learning [5]. Group 4: Course Structure - The course consists of several chapters, starting with an introduction to end-to-end algorithms, followed by background knowledge essential for understanding the technology stack [9][10]. - The second chapter focuses on the most frequently asked technical keywords in job interviews over the next two years [10]. - Subsequent chapters delve into two-stage end-to-end methods, one-stage end-to-end methods, and practical assignments involving RLHF fine-tuning [12][13]. Group 5: Learning Outcomes - Upon completion, participants are expected to reach a level equivalent to one year of experience as an end-to-end autonomous driving algorithm engineer [19]. - The course aims to deepen understanding of key technologies such as BEV perception, multimodal large models, and reinforcement learning, enabling participants to apply learned concepts to real projects [19].
会自检的VLA!ReflectDrive:更安全更高效scaling的端到端框架(理想&清华)
自动驾驶之心· 2025-09-27 23:33
Core Viewpoint - ReflectDrive is a novel learning framework that integrates a reflective mechanism to achieve safe trajectory generation through discrete diffusion, addressing the challenges in end-to-end autonomous driving systems [4][46]. Group 1: Introduction and Background - Autonomous driving is leading the transportation industry towards a safer and more efficient future, with end-to-end (E2E) systems becoming a mainstream alternative to traditional modular designs [4]. - Visual-Language-Action (VLA) models combine pre-trained knowledge from visual-language models (VLM) to enhance adaptability in complex scenarios [4][5]. - Current learning-based methods have not resolved core challenges in imitation learning driving systems, particularly in encoding physical rules like collision avoidance [4][5]. Group 2: ReflectDrive Framework - ReflectDrive proposes a new learning framework that utilizes a discrete diffusion reflective mechanism for safe trajectory generation [3][12]. - The framework begins by discretizing the two-dimensional driving space to construct an action codebook, allowing fine-tuning of pre-trained diffusion language models for planning tasks [3][14]. - The reflective mechanism operates without gradient calculations, enabling iterative self-correction inspired by spatiotemporal joint planning [3][8]. Group 3: Methodology and Mechanism - The reflective inference process consists of two stages: target condition trajectory generation and safety-guided regeneration [20][25]. - The framework integrates safety metrics to evaluate generated multimodal trajectories, identifying unsafe path points through local search methods [8][25]. - The iterative optimization loop continues until the trajectory is deemed safe or computational limits are reached, ensuring high efficiency in real-time performance [31][32]. Group 4: Experimental Results - ReflectDrive was evaluated on the NAVSIM benchmark, demonstrating significant improvements in safety metrics such as collision rates and compliance with drivable areas [32][38]. - The introduction of the safety-guided regeneration mechanism led to substantial enhancements in safety indicators, with notable increases in DAC (3.9%), TTC (1.3%), NC (0.8%), and EP (7.9%) compared to the baseline [37][38]. - When using ground-truth agent information, ReflectDrive's performance approached human driving levels, achieving NC of 99.7% and DAC of 99.5% [38][39]. Group 5: Conclusion - ReflectDrive effectively integrates a reflective mechanism with discrete diffusion for safe trajectory generation, validated by its performance on the NAVSIM benchmark [46].
对比之后,VLA的成熟度远高于世界模型...
自动驾驶之心· 2025-09-26 16:03
Core Insights - The article discusses the competition between VLA (Vision-Language Action) models and world models in the field of end-to-end autonomous driving, highlighting that over 90% of current models are segmented end-to-end rather than purely VLA or world models [2][6]. Group 1: Model Comparison - VLA models, represented by companies like Gaode Map and Horizon Robotics, show superior performance compared to world models, with the latest VLA papers published in September 2023 [6][43]. - The performance metrics of various models indicate that VLA models outperform world models significantly, with the best VLA model achieving an average L2 distance of 0.19 meters and a collision rate of 0.08% [5][6]. Group 2: Data Utilization - The Shanghai AI Lab's GenAD model utilizes unlabelled data sourced from the internet, primarily YouTube, to enhance generalization capabilities, contrasting with traditional supervised learning methods that rely on labeled data [7][19]. - The GenAD framework employs a two-tier training approach similar to Tesla's, integrating diffusion models and Transformers, but requires high-precision maps and traffic rules for effective operation [26][32]. Group 3: Testing Methods - Two primary testing methods for end-to-end autonomous driving are identified: open-loop testing using synthetic data in simulators like CARLA, and closed-loop testing based on real-world collected data [4][6]. - The article emphasizes the limitations of open-loop testing, which cannot provide feedback on the execution of predicted actions, making closed-loop testing more reliable for evaluating model performance [4][6]. Group 4: Future Directions - The article suggests that while world models have potential, their current implementations often require additional labeled data, which diminishes their advantages in generalization and cost-effectiveness compared to VLA models [43]. - The ongoing research and development in the field indicate a trend towards improving the integration of various data sources and enhancing model robustness through advanced training techniques [19][32].
AnchDrive:一种新端到端自动驾驶扩散策略(上大&博世)
自动驾驶之心· 2025-09-26 07:50
Core Insights - The article introduces AnchDrive, an end-to-end framework for autonomous driving that effectively addresses the challenges of multimodal behavior and generalization in long-tail scenarios [1][10][38] - AnchDrive utilizes a hybrid trajectory anchor approach, combining dynamic and static anchors to enhance trajectory quality and robustness in planning [10][38] Group 1: Introduction and Background - End-to-end autonomous driving algorithms have gained significant attention due to their superior scalability and adaptability compared to traditional rule-based motion planning methods [4][12] - These methods learn control signals directly from raw sensor data, reducing the complexity of modular design and minimizing cumulative perception errors [4][12] Group 2: Methodology - AnchDrive employs a multi-head trajectory decoder that dynamically generates a set of trajectory anchors, capturing behavioral diversity under local environmental conditions [8][15] - The framework integrates a large-scale static anchor set derived from human driving data, providing cross-scenario behavioral prior knowledge [8][15] Group 3: Experimental Results - In the NAVSIM v2 simulation platform, AnchDrive achieved an Extended Predictive Driver Model Score (EPDMS) of 85.5, indicating its ability to generate robust and contextually appropriate behaviors in complex driving scenarios [9][30][34] - The performance of AnchDrive was significantly higher than existing methods, with an 8.9 point increase in EPDMS compared to VADv2, while reducing the number of trajectory anchors from 8192 to just 20 [34] Group 4: Contributions - The main contributions of the article include the introduction of the AnchDrive framework, which utilizes a truncated diffusion process initialized from a hybrid trajectory anchor set, significantly improving initial trajectory quality and planning robustness [10][38] - The design of a mixed perception model with dense and sparse branches enhances the planner's understanding of obstacles and road geometry [11][18]