Workflow
端到端自动驾驶
icon
Search documents
复旦SeerDrive:一种轨迹规划和场景演化的双向建模端到端框架
自动驾驶之心· 2025-10-14 23:33
Core Insights - The article discusses the advancements in end-to-end autonomous driving, specifically focusing on the SeerDrive model, which aims to improve trajectory planning by incorporating bidirectional modeling of trajectory planning and scene evolution [1][3][4]. Group 1: SeerDrive Overview - SeerDrive introduces a bidirectional modeling paradigm that captures scene dynamics while allowing planning results to optimize scene predictions, creating a closed-loop iteration [3][4]. - The overall pipeline of SeerDrive consists of four main modules: feature encoding, future BEV world modeling, future perception planning, and iterative optimization [4]. Group 2: Challenges in Current Systems - Current one-shot paradigms in autonomous driving overlook dynamic scene evolution, leading to inaccurate planning in complex interactions [5]. - Existing systems fail to model the impact of vehicle behavior on the surrounding environment, which is crucial for accurate trajectory planning [5]. Group 3: Technical Components - Feature encoding transforms multimodal sensor inputs and vehicle states into structured features, laying the groundwork for subsequent modeling [8][9]. - Future BEV world modeling predicts scene dynamics by generating future BEV features, balancing efficiency and structured representation [10][13]. Group 4: Planning and Optimization - SeerDrive employs a decoupled strategy for planning, allowing current and future scenes to guide planning separately, thus avoiding representation entanglement [15]. - The iterative optimization process enhances the bidirectional dependency between trajectory planning and scene evolution, leading to improved performance [17]. Group 5: Experimental Results - SeerDrive achieved a PDMS score of 88.9 on the NAVSIM test set, outperforming several state-of-the-art methods [23]. - In the nuScenes validation set, SeerDrive demonstrated an average L2 displacement error of 0.43m and a collision rate of 0.06%, significantly better than competing methods [24]. Group 6: Component Effectiveness - The removal of future perception planning or iterative optimization resulted in a decrease in PDMS scores, indicating the importance of these components for performance enhancement [26]. - The design choices, such as the decoupled strategy and the use of anchored endpoints for future ego feature initialization, proved to be critical for achieving optimal results [30]. Group 7: Limitations and Future Directions - The BEV world model does not leverage the generalization capabilities of foundational models, which could enhance performance in complex scenarios [41]. - Future research may explore the integration of foundational models with planning to improve generalization while maintaining efficiency [41].
学术和量产的分歧,技术路线的持续较量!从技术掌舵人的角度一览智驾的十年路....
自动驾驶之心· 2025-10-14 23:33
Core Insights - The article discusses the significant technological advancements in autonomous driving over the past decade, highlighting key innovations such as Visual Transformers, BEV perception, multi-sensor fusion, end-to-end autonomous driving, large models, VLA, and world models [3][4]. Group 1: Technological Milestones - The past ten years have seen remarkable technological developments in autonomous driving, with various solutions emerging through the collision and fusion of different technologies [3]. - A roundtable discussion is set to reflect on the technological milestones in the industry, focusing on the debate between world models and VLA [4][13]. Group 2: Industry Perspectives - The roundtable will feature insights from top industry leaders, discussing the evolution of autonomous driving technology and providing career advice for newcomers in the field [4][5]. - The discussion will also cover the perspectives of academia and industry regarding L3 autonomous driving, emphasizing the convergence of research directions and the practical implementation in engineering [13]. Group 3: Future Directions - The article raises questions about the future direction of autonomous driving technology, particularly the role of end-to-end systems as a foundational element of intelligent driving technology [13]. - It highlights the ongoing competition between academic research and engineering practices in the field, suggesting a need for new entrants to adapt and innovate [13].
地平线残差端到端是如何实现的?ResAD:残差学习让自动驾驶决策更接近人类逻辑
自动驾驶之心· 2025-10-13 23:33
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | Zhiyu Zheng等 编辑 | 自动驾驶之心 想让车子自己开,传统方法得像搭积木:先"看"(感知),再"猜"(预测),最后"做决定"(规划)。这套流程环环相扣,一个环节出错,后面全跟着错, 既不高效,也不安全。 于是, 端到端自动驾驶 成了一条新路。它想让AI像老司机一样,直接把看到的(传感器数据)变成要走的路线(未来轨迹)。想法很美好,但现实很骨 感:现有的端到端模型,大多在死磕一个问题—— "未来的轨迹长啥样?" 为了解决这些问题,地平线、华科和武大的团队提出了 ResAD 框架。核心思想很简单: 不直接预测整条轨迹,而是先给一个"惯性参考线"——就是车子如 果不动方向盘会走的路线。然后,让模型只学习一个"调整量"(残差),即为了安全行驶,需要偏离这根参考线多少。 这样一来,学习目标就从 "轨迹是什么?" 变成了 "为什么要调整方向?" 。模型被迫去关注那些导致调整的真实原因,比如障碍物、交通规则等,而不是死 记硬背数据里的巧合。 我们 ...
端到端和VLA占据自动驾驶前沿方向的主流了。。。
自动驾驶之心· 2025-10-13 04:00
Core Insights - The article discusses the evolution of end-to-end algorithms in autonomous driving, highlighting the transition from modular production algorithms to end-to-end approaches and the recent focus on Vision-Language Models (VLA) [1][3]. Group 1: End-to-End Algorithms - End-to-end algorithms are central to the current mass production of autonomous driving technology, involving a rich technology stack [1]. - There are two main paradigms in the industry: single-stage and two-stage approaches, with UniAD being a representative of the single-stage paradigm [1]. - The single-stage approach can be further categorized into several subfields, including perception-based, diffusion model-based, world model-based, and VLA-based end-to-end algorithms [1]. Group 2: VLA and Course Offerings - The article mentions the recent surge in interest regarding how to efficiently learn about end-to-end and VLA technologies, leading to the creation of specialized courses [3]. - The "End-to-End and VLA Autonomous Driving Course" focuses on VLA, covering topics from VLM as an autonomous driving interpreter to modular and integrated VLA approaches [3]. - The course includes a detailed theoretical foundation and practical assignments to help participants build their own VLA models and datasets [3]. Group 3: Course Instructors - The course features a team of instructors with significant academic and practical experience in multi-modal perception, autonomous driving VLA, and large model frameworks [7][9]. - Instructors have published numerous papers in top international conferences and have hands-on experience in developing and implementing cutting-edge algorithms in the field [7][9][10]. Group 4: Target Audience and Requirements - The courses are designed for individuals with a foundational understanding of autonomous driving and familiarity with key technologies such as transformer models, reinforcement learning, and BEV perception [13]. - Participants are expected to have a basic knowledge of probability theory, linear algebra, and proficiency in Python and PyTorch [13].
Waymo提出Drive&Gen:用生成视频评估端到端自动驾驶(IROS'25)
自动驾驶之心· 2025-10-12 23:33
作者 | Jiahao Wang 来源 | 我爱计算机视觉 传统的自动驾驶系统像一个部门林立的大公司,感知、预测、规划等模块各司其职,虽然稳定,但流程繁琐,一个环节出错就可能影响全局。而E2E模型就 像一个全能的创业团队,直接从摄像头画面等原始输入,一步到位输出驾驶决策,简洁高效,潜力巨大。 但问题也随之而来:AI生成的视频真的足够"真实",能骗过自动驾驶系统,并用来做严肃的评估吗?我们又该如何深入了解E2E驾驶模型的"脾气",修复它 的短板,让它在没见过的新场景(比如突然的暴雨天)里也能从容应对? 为了回答这些问题,来自约翰霍普金斯大学、Waymo和谷歌DeepMind的研究者们联手,在即将于IROS 2025会议上发表的论文中,提出了一个名为 Drive&Gen 的新框架。这个名字很直白,就是将 驾驶(Drive) 和 生成(Gen) 结合起来,旨在连接E2E驾驶模型和生成式世界模型,共同评估和提升彼 此。 背景:当E2E驾驶遇上生成式AI 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术 ...
工业界大佬带队!三个月搞定端到端自动驾驶
自动驾驶之心· 2025-10-12 23:33
Core Viewpoint - 2023 marks the year of end-to-end production, with 2024 expected to be a significant year for end-to-end production in the automotive industry, as leading new forces and manufacturers have already achieved end-to-end production [1][3]. Group 1: End-to-End Production Development - The automotive industry is witnessing rapid development in end-to-end production, particularly in one-stage and two-stage paradigms, with one-stage methods like UniAD being prominent [1][3]. - Various one-stage methods have emerged, including perception-based, world model-based, diffusion model-based, and VLA-based approaches, indicating a strong push from both autonomous driving companies and vehicle manufacturers towards self-research and mass production of end-to-end autonomous driving [3][5]. Group 2: Course Overview - A course titled "End-to-End and VLA Autonomous Driving" has been launched, focusing on cutting-edge algorithms in both one-stage and two-stage end-to-end methods, aimed at bridging academic and industrial advancements [5][15]. - The course is structured into several chapters, covering topics such as the history and evolution of end-to-end algorithms, background knowledge on VLA, and detailed discussions on two-stage and one-stage end-to-end methods [9][10][12]. Group 3: Key Technologies and Techniques - The course emphasizes key technologies such as BEV perception, visual language models (VLM), diffusion models, and reinforcement learning, which are essential for mastering the latest advancements in autonomous driving [5][11]. - The second chapter of the course is highlighted as crucial for understanding the most frequently asked technical keywords in job interviews over the next two years [10]. Group 4: Practical Applications and Outcomes - The course includes practical assignments, such as RLHF fine-tuning, allowing participants to apply their knowledge in real-world scenarios and understand how to build and experiment with reinforcement learning modules [13][19]. - By completing the course, participants are expected to reach a level equivalent to one year of experience as an end-to-end autonomous driving algorithm engineer, gaining a comprehensive understanding of various methodologies and their applications [19].
工业界和学术界大佬带队!彻底搞定端到端与VLA
自动驾驶之心· 2025-10-09 23:32
Core Insights - The article discusses the evolution of end-to-end algorithms in autonomous driving, highlighting the transition from modular production algorithms to end-to-end and now to Vision-Language Alignment (VLA) models [1][3] - It emphasizes the rich technology stack involved in end-to-end algorithms, including BEV perception, visual language models (VLM), diffusion models, reinforcement learning, and world models [3][10] Summary by Sections End-to-End Algorithms - End-to-end algorithms are categorized into two main paradigms: single-stage and two-stage, with UniAD being a representative of the single-stage approach [1] - Single-stage can further branch into various subfields, particularly those based on VLA, which have seen a surge in related publications and industrial applications in recent years [1] VLA and Course Offerings - The article mentions the launch of courses aimed at helping individuals quickly and efficiently learn about end-to-end and VLA in autonomous driving, featuring collaboration between industry and academia [3] - The "VLA and Large Model Practical Course" focuses on VLA, covering topics from VLM as an autonomous driving interpreter to modular and integrated VLA approaches [3] Course Structure and Faculty - The course structure includes a comprehensive overview of VLA, with detailed theoretical foundations in Vision, Language, and Action, as well as practical assignments to build VLA models and datasets from scratch [3][10] - The teaching team consists of experienced professionals from top academic institutions and industry, with backgrounds in multimodal perception, autonomous driving, and large model frameworks [7][9][10] Target Audience and Requirements - The courses are designed for individuals with a foundational understanding of autonomous driving and familiarity with key technologies such as transformer models, reinforcement learning, and BEV perception [13] - Participants are expected to have a basic knowledge of probability theory, linear algebra, and programming skills in Python and PyTorch [13]
模仿学习无法真正端到端?
自动驾驶之心· 2025-10-08 23:33
Core Viewpoint - The article emphasizes that in the autonomous driving industry, the training methods are more critical than model architectures like VLA or world models, highlighting the limitations of imitation learning in achieving true end-to-end autonomous driving [2][14]. Limitations of Imitation Learning - Imitation learning assumes that expert data is optimal, but in the context of driving, there is no single perfect driving behavior due to the diverse styles and strategies of human drivers [3][4]. - The training data lacks consistency and optimality, leading to models that learn vague and imprecise driving patterns rather than clear and logical strategies [3][4]. - Imitation learning fails to distinguish between critical decision-making scenarios and ordinary ones, resulting in models that may make fatal errors in crucial moments [5][6]. Key Scene Identification - The article discusses the importance of identifying key scenes in driving, where the model's output precision is critical, especially in complex scenarios [7][8]. - It introduces the concept of "advantage" from reinforcement learning, which helps define key states where optimal actions significantly outperform others [7]. Out-of-Distribution (OOD) Issues - Open-loop imitation learning can lead to cumulative errors, causing the model to enter states that differ from the training data distribution, resulting in performance degradation [8][10][12]. - The article illustrates that models trained purely on imitation learning may struggle in critical situations, such as timely lane changes, due to their reliance on suboptimal behaviors learned from human data [13]. Conclusion - The core of technological development lies in identifying key routes and bottlenecks rather than merely following trends, suggesting a need for new methods beyond imitation learning to address its limitations [14].
纵向端到端是自动驾驶技术的一道分水岭
自动驾驶之心· 2025-10-04 04:04
Core Insights - The article discusses the evolution of end-to-end autonomous driving technology, highlighting the shift from horizontal to vertical end-to-end systems as a new industry focus [2][3] - It emphasizes the importance of vertical end-to-end control for achieving human-like driving efficiency, particularly in speed and braking control [4][16] Group 1: Importance of Vertical End-to-End Control - Vertical end-to-end control is essential for achieving smooth acceleration and deceleration, which is a key differentiator between novice and experienced drivers [3][4] - The article defines "defensive deceleration" as the ability to adjust speed based on necessity and prediction, balancing safety and efficiency [4][12] - Current autonomous systems often prioritize navigation efficiency over vertical control, making it challenging to implement effective speed adjustments [15][16] Group 2: Challenges in Achieving Vertical End-to-End Control - Many autonomous driving systems have successfully implemented horizontal end-to-end control, but vertical control remains a significant challenge [13][16] - The noise in human driving data complicates the learning process for autonomous systems, making it difficult to distinguish meaningful speed control from random fluctuations [16][17] - Solutions to improve vertical control include data cleaning, causal reasoning, and reinforcement learning, which are being explored by leading autonomous driving teams [17]
有人在自驾里面盲目内卷,而有的人在搭建真正的壁垒...
自动驾驶之心· 2025-09-29 23:33
Core Viewpoint - The automotive industry is undergoing a significant transformation, with numerous executive changes and a focus on advanced technologies such as autonomous driving and artificial intelligence [1][3]. Group 1: Industry Changes - In September, 48 executives in the automotive sector underwent changes, indicating a shift in leadership and strategy [1]. - Companies like Li Auto and BYD are restructuring their teams to enhance their capabilities in autonomous driving and cockpit technology [1]. - The industry is witnessing a rapid evolution in algorithm development, moving from BEV to more complex models like VLA and world models [1][3]. Group 2: Autonomous Driving Focus - The forefront of autonomous driving technology is centered on VLA/VLM, end-to-end driving, world models, and reinforcement learning [3]. - There is a notable gap in understanding the industry's actual progress among students and mid-sized companies, highlighting the need for better communication between academia and industry [3]. Group 3: Community and Knowledge Sharing - A community called "Autonomous Driving Heart Knowledge Planet" has been established to bridge the gap between academic and industrial knowledge, aiming to grow to nearly 10,000 members in two years [5]. - The community offers a comprehensive platform for learning, including video content, Q&A, and job exchange, catering to both beginners and advanced learners [6][10]. - Members can access over 40 technical routes and engage with industry leaders to discuss trends and challenges in autonomous driving [6][8]. Group 4: Learning Resources - The community provides various resources for practical questions related to autonomous driving, such as entry points for end-to-end systems and data annotation practices [6][11]. - A detailed curriculum is available for newcomers, covering essential topics in autonomous driving technology [20][21]. - The platform also includes job referral mechanisms to connect members with potential employers in the autonomous driving sector [13][14].