Workflow
端到端自动驾驶
icon
Search documents
北交&地平线提出DIVER:扩散+强化的多模态规划新框架
自动驾驶之心· 2025-12-17 03:18
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | Ziying Song等 编辑 | 自动驾驶之心 研究背景 端到端自动驾驶正在快速演进,但在大量真实测试与闭环评测中,一个非常典型的问题始终存在: 车辆的规划行为过于保守、模式单一,难以应对复杂交通场景。 这是因为主流端到端方法大多依赖 单一专家示范的模仿学习范式: 当前主流自动驾驶系统正加速向端到端范式演进,通过统一的深度网络将感知、预测与规划整合在一起,从多视角传感器数据直接生成车 辆未来轨迹或控制指令,在复杂城市场景中展现出良好的整体性能。 然而,现有端到端自动驾驶方法大多仍基于单一专家示范的模仿学习范式进行训练,模型被迫去拟合一条"唯一正确"的专家轨迹。即使引 入多模态规划,生成的多条候选轨迹也往往高度聚集在 Ground Truth 附近,缺乏真正有意义的行为多样性。在复杂交互、转弯或不确定性 较高的场景中,这种模式坍塌现象会限制系统对多种安全可行决策的探索能力。 近日,来自北京交通大学、地平线机器人、华中科技大学、清华大学、澳门大学 ...
小鹏最新一篇基于潜在思维链世界模型的FutureX,车端可以借鉴...
自动驾驶之心· 2025-12-15 06:00
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | Hongbin Lin等 编辑 | 自动驾驶之心 港中文联合小鹏最新的一篇工作,很有意思。基于潜在思维链世界模型增强端到端的能力, 有一些值得业内尝试的改进点: 一、背景回顾 端到端(E2E)自动驾驶指的是通过完全可微分的映射,直接将多模态原始传感器数据流转换为运动规划或底层驱动指令的技术流水线。该领域在算法方案和基准测 试两方面均取得了快速发展。尽管面临固有挑战,现有方法仍实现了显著进步。 在这些成功背后,现有端到端自动驾驶系统通过单一神经网络直接将传感器输入映射为控制输出,执行高效的一次性前向预测,而无需进一步"思考"。这导致它们在 复杂环境中缺乏适应性和可解释性(图1第二行)。在人类认知中,驾驶员在执行任何操作前,都会在脑海中模拟可能的未来场景:预测周围车辆的运动趋势、场景的 演变方向,以及每种可能行为的潜在结果(图1第一行)。这种内在推理能力使人类能够做出安全且贴合场景的决策。因此,对于端到端系统而言,在高度动态的交通 环境中推断未来场 ...
南洋理工&哈佛提出OpenREAD:端到端RL统一认知与轨迹规划
自动驾驶之心· 2025-12-13 02:04
以下文章来源于深蓝AI ,作者深蓝学院 深蓝AI . 专注于人工智能、机器人与自动驾驶的学习平台。 作者 | 深蓝学院 来源 | 深蓝AI 原文链接: 南洋理工、哈佛提出OpenREAD:用端到端RL统一驾驶认知与轨迹规划 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 让视觉语言大模型 同时学会 " 思考 " 与 " 执行 " 」 在自动驾驶研究中,利用大语言视觉语言模型(LLMNLM)学习开放式驾驶知识,进而提升轨迹规划与决策能力,正逐渐成为新的趋势。 然而,传统的监督微调(SFT)范式难以充分挖掘模型的推理潜力,对知识的学习效率也存在不足。DeepSeek-R1的出现向我们展示了强化学习在提升模 型推理与思考能力方面的巨大潜力,使模型具备更强的泛化表现。 因此,一个关键问题随之而来:如何通过强化学习增强视觉语言模型的推理能力,让模型"学会思考",并在同一框架下同时掌握开放式驾驶知识与轨迹规 划?这正是基于视觉语言大模型实现端到端自动驾驶所面临的全新挑战。 南洋理 ...
时隔一年DiffusionDrive升级到v2,创下了新纪录!
自动驾驶之心· 2025-12-11 03:35
Core Insights - The article discusses the upgrade of DiffusionDrive to version 2, highlighting its advancements in end-to-end autonomous driving trajectory planning through the integration of reinforcement learning to address the challenges of diversity and sustained high quality in trajectory generation [1][3][10]. Background Review - The shift towards end-to-end autonomous driving (E2E-AD) has emerged as traditional tasks like 3D object detection and motion prediction have matured. Early methods faced limitations in modeling, often generating single trajectories without alternatives in complex driving scenarios [5][10]. - Previous diffusion models applied to trajectory generation struggled with mode collapse, leading to a lack of diversity in generated behaviors. DiffusionDrive introduced a Gaussian Mixture Model (GMM) to define prior distributions for initial noise, promoting diverse behavior generation [5][13]. Methodology - DiffusionDriveV2 introduces a novel framework that utilizes reinforcement learning to overcome the limitations of imitation learning, which previously led to a trade-off between diversity and sustained high quality in trajectory generation [10][12]. - The framework incorporates intra-anchor GRPO and inter-anchor truncated GRPO to manage advantage estimation within specific driving intentions, preventing mode collapse by avoiding inappropriate comparisons between different intentions [9][12][28]. - The method employs scale-adaptive multiplicative noise to enhance exploration while maintaining trajectory smoothness, addressing the inherent scale inconsistency between proximal and distal segments of trajectories [24][39]. Experimental Results - Evaluations on the NAVSIM v1 and NAVSIM v2 datasets demonstrated that DiffusionDriveV2 achieved state-of-the-art performance, with a PDMS score of 91.2 on NAVSIM v1 and 85.5 on NAVSIM v2, significantly outperforming previous models [10][33]. - The results indicate that DiffusionDriveV2 effectively balances trajectory diversity and sustained quality, achieving optimal performance in closed-loop evaluations [38][39]. Conclusion - The article concludes that DiffusionDriveV2 successfully addresses the inherent challenges of imitation learning in trajectory generation, achieving an optimal trade-off between planning quality and diversity through innovative reinforcement learning techniques [47].
随到随学!端到端与VLA自动驾驶小班课正式结课
自动驾驶之心· 2025-12-09 19:00
Core Viewpoint - 2023 marks the year of end-to-end production, with 2024 expected to be a significant year for end-to-end production in the automotive industry, as leading new forces and manufacturers have already achieved end-to-end production [1][3]. Group 1: End-to-End Production Development - The automotive industry has two main paradigms: single-stage and two-stage, with UniAD being a representative of the single-stage approach that directly models vehicle trajectories from sensor inputs [1]. - Since last year, the single-stage end-to-end development has rapidly advanced, leading to various derivatives such as perception-based, world model-based, diffusion model-based, and VLA-based single-stage methods [3][5]. - Major players in the autonomous driving sector, including both solution providers and car manufacturers, are focusing on self-research and production of end-to-end autonomous driving technologies [3]. Group 2: Course Overview - A course titled "End-to-End and VLA Autonomous Driving" has been launched, aimed at teaching cutting-edge algorithms in both single-stage and two-stage end-to-end approaches, with a focus on the latest developments in the industry and academia [5][14]. - The course is structured into several chapters, starting with an introduction to end-to-end algorithms, followed by background knowledge on various technologies such as VLA, diffusion models, and reinforcement learning [8][9]. - The second chapter is highlighted as containing the most frequently asked technical keywords for job interviews in the next two years [9]. Group 3: Technical Focus Areas - The course covers various subfields of single-stage end-to-end methods, including perception-based (UniAD), world model-based, diffusion model-based, and the currently popular VLA-based approaches [10][12]. - The curriculum includes practical assignments, such as RLHF fine-tuning, and aims to provide students with hands-on experience in building and experimenting with pre-trained and reinforcement learning modules [11][12]. - The course emphasizes the importance of understanding BEV perception, multi-modal large models, and the latest advancements in diffusion models, which are crucial for the future of autonomous driving [12][16].
世界模型与自动驾驶小班课正式推出!特斯拉世界模型、视频OCC生成一网打尽~
自动驾驶之心· 2025-12-09 07:59
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 Jason老师新课《世界模型与自动驾驶小班课》正式推出啦! 自动驾驶之心联合 工业界大佬 共同开展,先前的《端到端与VLA自动驾驶小班课》备受大家好评,因 此我们进一步推出这门世界模型小班课, 课程聚焦于通用世界模型、视频生成、OCC生成等世界模型算法,涵盖特斯拉世界模型、李飞飞团队Marble等。欢迎大 家加入学习~ 课程大纲 这门课程讲如何展开 第一章:世界模型介绍 第一章主要针对自动驾驶世界模型概括性的内容讲解。 这一章老师会先复盘世界模型和端到端自动驾驶的联系,接着讲解世界模型的发展历史以及当下的应用案 例。然后介绍世界模型有哪些流派:纯仿真的世界模型、仿真+Planning、生成传感器输入、生成感知结果等等流派。每一种流派在当前业界的应用,能解决什么问 题,处于自驾的哪个环节。学术界和工业界都在做什么,相关的数据集、评测都有啥。在这一章节为大家一一解答~ 第二章:世界模型的背景知识 早鸟优惠!开课即止~ 讲师介绍 Jason:C9本科+QS50 PhD,已发表CCF-A论文2篇,CCF-B论文若干。 ...
以理想汽车为例,探寻自动驾驶的「大脑」进化史 - VLA 架构解析
自动驾驶之心· 2025-12-07 02:05
作者 | 我要吃鸡腿 编辑 | 自动驾驶之心 原文链接: https://zhuanlan.zhihu.com/p/1965839552158623077 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 在自动驾驶这个飞速迭代的领域,技术范式的更迭快得令人目不暇接。前年,行业言必称BEV(鸟瞰图视 角);去年,"端到端"(End-to-End)又成了新的技术高地。然而,每一种范式在解决旧问题的同时,似乎都 在催生新的挑战。 传统的"端到端"自动驾驶,即VA(Vision-Action,视觉-行动)模型,就暴露出一个深刻的矛盾:它就像一个 车技高超但沉默寡言的"老司机"。它能凭借海量数据训练出的"直觉",在复杂的路况中做出令人惊叹的丝滑操 作。但当您坐在副驾,心脏漏跳一拍后问它:"刚才为什么突然减速?"——它答不上来。 这就是"黑箱"问题:系统能"做对",但我们不知道它"为何做对"。这种无法解释、无法沟通的特性,带来了巨 大的信任危机。 自动驾驶的三大范式演进。(a) ...
端到端时代下的自动驾驶感知
自动驾驶之心· 2025-12-05 00:03
Core Insights - The article discusses the resurgence of end-to-end (E2E) perception in the autonomous driving industry, highlighting its impact on the field and the shift from traditional modular approaches to more integrated solutions [4][5][9]. Group 1: End-to-End Revival - End-to-end is not a new technology; it was initially hoped to directly use neural networks to output trajectories from camera images, but stability and safety were issues [9]. - The traditional architecture of localization, perception, planning, and control has been the mainstream approach, but advancements in BEV perception and Transformer architectures have revived end-to-end methods [9]. - Companies are now exploring various one-stage and two-stage solutions, with a focus on neural network-based planning modules [9]. Group 2: Perception Benefits in End-to-End - In traditional frameworks, perception aimed to gather as much accurate scene information as possible for planning, but this modular design limited the ability to meet planning needs [11]. - Current mainstream end-to-end solutions continue to follow this approach, treating various perception tasks as auxiliary losses [13]. - The key advantage of end-to-end is the shift from exhaustive perception to "Planning-Oriented" perception, allowing for a more efficient and demand-driven approach [14][15]. Group 3: Navigation-Guided Perception - The article introduces a Navigation-Guided Perception model, which suggests that perception should be guided by navigation information, similar to how human drivers focus on relevant scene elements based on driving intent [16][18]. - A Scene Token Learner (STL) module is proposed to efficiently extract scene features based on BEV characteristics, integrating navigation information to enhance perception [18][19]. - The SSR framework demonstrates that only 16 self-supervised queries can effectively represent the necessary perception information for planning tasks, significantly reducing the complexity compared to traditional methods [22]. Group 4: World Models and Implicit Supervision - The article discusses the potential of world models to replace traditional perception tasks, providing implicit supervision for scene representation [23][21]. - The SSR framework aims to enhance understanding of scenes through self-supervised learning, predicting future BEV features to improve scene query comprehension [20][21]. - The design allows for efficient trajectory planning while maintaining consistency for model convergence during training [20]. Group 5: Performance Metrics - The SSR framework outperforms various state-of-the-art (SOTA) methods in both efficiency and performance, achieving significant improvements in metrics such as L2 distance and collision rates [24]. - The framework's design allows for a reduction in the number of queries needed for effective scene representation, showcasing its scalability and efficiency [22][24].
特斯拉为什么现在不选择VLA?
自动驾驶之心· 2025-12-02 00:03
Core Insights - The article discusses Tesla's latest Full Self-Driving (FSD) technology, questioning whether its architecture is outdated compared to the emerging VLA (Vision-Language-Action) framework used in robotics [3][4]. Comparison of Robotics and Autonomous Driving - **Task Objectives**: Robotics can execute any human command, while autonomous driving focuses on navigation from point A to B, relying on map data for precision [4]. - **Operating Environment**: Autonomous driving operates on defined roads with fewer complex tasks, making it less reliant on language processing compared to robotics [4]. - **Hardware Limitations**: Current hardware lacks sufficient processing power (under 1000 TOPS), making it challenging to implement large language models for driving tasks, which could compromise safety [5]. Tesla's Approach - Tesla employs a hybrid logic of fast and slow thinking, primarily using an end-to-end approach for most scenarios, while only utilizing VLM in specific situations like traffic regulations or unstructured road conditions [5].
英伟达又一新作!MPA:基于模型的闭环端到端自适应策略新框架(CMU&斯坦福等)
自动驾驶之心· 2025-12-01 00:04
Core Insights - The article discusses the Model-Based Policy Adaptation (MPA) framework aimed at enhancing the robustness and safety of end-to-end (E2E) autonomous driving agents during closed-loop evaluations [2][6][41] - MPA addresses the challenges of cascading errors and insufficient generalization capabilities in closed-loop evaluations by utilizing a model-based approach to adapt pre-trained E2E driving agents [2][6] Summary by Sections Background - E2E autonomous driving models have shown significant progress by integrating perception, prediction, and planning into a unified learning framework, but they face performance degradation in closed-loop environments due to cumulative errors and distribution shifts [3][6] - The gap between offline training and online objectives highlights the need for improved closed-loop performance evaluation [5][9] MPA Framework - MPA is designed to bridge the performance gap by generating counterfactual data using a high-fidelity 3D Gaussian splatter (3DGS) simulation engine, which allows the agent to experience diverse scenarios beyond the original dataset [7][14] - The framework includes a diffusion model-based policy adapter and a multi-step Q-value model to optimize the agent's predictions and evaluate long-term rewards [7][21] Experimental Results - MPA was validated on the nuScenes benchmark dataset, demonstrating significant performance improvements in both in-domain and out-of-domain scenarios, particularly in safety-critical situations [11][33] - The results indicate that MPA outperforms baseline models, achieving higher scores in key metrics such as route completion (RC) and HDScore [33][36] Contributions - The article outlines three main contributions: 1. Analysis of the root causes of performance decline in closed-loop evaluations and the fidelity of 3DGS simulations [11][41] 2. Development of a systematic counterfactual data generation process and training of the MPA framework [11][43] 3. Demonstration of MPA's effectiveness in enhancing the performance of E2E driving agents in various scenarios [41][43] Limitations and Future Work - The MPA framework relies on the assumption that 3DGS can provide reliable rendering under constrained trajectory deviations, which may not hold in all cases [44] - Future work will focus on expanding the dataset, integrating online reinforcement learning, and enhancing the framework's robustness in diverse driving conditions [44][46]