端到端自动驾驶
Search documents
世界模型与自动驾驶小班课正式推出!特斯拉世界模型、视频OCC生成一网打尽~
自动驾驶之心· 2025-12-09 07:59
Core Insights - The article introduces a new course on world models and autonomous driving, emphasizing the importance of understanding various algorithms and their applications in the industry [2][10]. Course Overview - The course is structured into six chapters, covering topics from the introduction of world models to practical applications in autonomous driving [5][10]. - Chapter one discusses the relationship between world models and end-to-end autonomous driving, including historical development and current applications [5]. - Chapter two focuses on foundational knowledge related to world models, including scene representation and key technologies like Transformer and BEV perception [6]. - Chapter three explores general world models, highlighting significant contributions from teams like Li Fei-Fei's Marble and DeepMind's Genie 3 [6][7]. - Chapter four delves into video generation algorithms, showcasing notable works such as Wayve's GAIA-1 & GAIA-2 and recent advancements in the field [7]. - Chapter five examines OCC generation models, discussing their potential for trajectory planning and end-to-end implementation [8]. - Chapter six provides insights into industry applications of world models, addressing common pain points and interview preparation for relevant positions [9]. Learning Outcomes - The course aims to equip participants with the skills to understand and implement world model algorithms, preparing them for roles in the autonomous driving sector [10][13]. - Participants are expected to achieve a level equivalent to one year of experience as a world model algorithm engineer upon completion [13]. Course Schedule - The course is set to begin on January 1, with a duration of approximately two and a half months, featuring offline video lectures and online Q&A sessions [14].
以理想汽车为例,探寻自动驾驶的「大脑」进化史 - VLA 架构解析
自动驾驶之心· 2025-12-07 02:05
作者 | 我要吃鸡腿 编辑 | 自动驾驶之心 原文链接: https://zhuanlan.zhihu.com/p/1965839552158623077 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 在自动驾驶这个飞速迭代的领域,技术范式的更迭快得令人目不暇接。前年,行业言必称BEV(鸟瞰图视 角);去年,"端到端"(End-to-End)又成了新的技术高地。然而,每一种范式在解决旧问题的同时,似乎都 在催生新的挑战。 传统的"端到端"自动驾驶,即VA(Vision-Action,视觉-行动)模型,就暴露出一个深刻的矛盾:它就像一个 车技高超但沉默寡言的"老司机"。它能凭借海量数据训练出的"直觉",在复杂的路况中做出令人惊叹的丝滑操 作。但当您坐在副驾,心脏漏跳一拍后问它:"刚才为什么突然减速?"——它答不上来。 这就是"黑箱"问题:系统能"做对",但我们不知道它"为何做对"。这种无法解释、无法沟通的特性,带来了巨 大的信任危机。 自动驾驶的三大范式演进。(a) ...
端到端时代下的自动驾驶感知
自动驾驶之心· 2025-12-05 00:03
Core Insights - The article discusses the resurgence of end-to-end (E2E) perception in the autonomous driving industry, highlighting its impact on the field and the shift from traditional modular approaches to more integrated solutions [4][5][9]. Group 1: End-to-End Revival - End-to-end is not a new technology; it was initially hoped to directly use neural networks to output trajectories from camera images, but stability and safety were issues [9]. - The traditional architecture of localization, perception, planning, and control has been the mainstream approach, but advancements in BEV perception and Transformer architectures have revived end-to-end methods [9]. - Companies are now exploring various one-stage and two-stage solutions, with a focus on neural network-based planning modules [9]. Group 2: Perception Benefits in End-to-End - In traditional frameworks, perception aimed to gather as much accurate scene information as possible for planning, but this modular design limited the ability to meet planning needs [11]. - Current mainstream end-to-end solutions continue to follow this approach, treating various perception tasks as auxiliary losses [13]. - The key advantage of end-to-end is the shift from exhaustive perception to "Planning-Oriented" perception, allowing for a more efficient and demand-driven approach [14][15]. Group 3: Navigation-Guided Perception - The article introduces a Navigation-Guided Perception model, which suggests that perception should be guided by navigation information, similar to how human drivers focus on relevant scene elements based on driving intent [16][18]. - A Scene Token Learner (STL) module is proposed to efficiently extract scene features based on BEV characteristics, integrating navigation information to enhance perception [18][19]. - The SSR framework demonstrates that only 16 self-supervised queries can effectively represent the necessary perception information for planning tasks, significantly reducing the complexity compared to traditional methods [22]. Group 4: World Models and Implicit Supervision - The article discusses the potential of world models to replace traditional perception tasks, providing implicit supervision for scene representation [23][21]. - The SSR framework aims to enhance understanding of scenes through self-supervised learning, predicting future BEV features to improve scene query comprehension [20][21]. - The design allows for efficient trajectory planning while maintaining consistency for model convergence during training [20]. Group 5: Performance Metrics - The SSR framework outperforms various state-of-the-art (SOTA) methods in both efficiency and performance, achieving significant improvements in metrics such as L2 distance and collision rates [24]. - The framework's design allows for a reduction in the number of queries needed for effective scene representation, showcasing its scalability and efficiency [22][24].
特斯拉为什么现在不选择VLA?
自动驾驶之心· 2025-12-02 00:03
Core Insights - The article discusses Tesla's latest Full Self-Driving (FSD) technology, questioning whether its architecture is outdated compared to the emerging VLA (Vision-Language-Action) framework used in robotics [3][4]. Comparison of Robotics and Autonomous Driving - **Task Objectives**: Robotics can execute any human command, while autonomous driving focuses on navigation from point A to B, relying on map data for precision [4]. - **Operating Environment**: Autonomous driving operates on defined roads with fewer complex tasks, making it less reliant on language processing compared to robotics [4]. - **Hardware Limitations**: Current hardware lacks sufficient processing power (under 1000 TOPS), making it challenging to implement large language models for driving tasks, which could compromise safety [5]. Tesla's Approach - Tesla employs a hybrid logic of fast and slow thinking, primarily using an end-to-end approach for most scenarios, while only utilizing VLM in specific situations like traffic regulations or unstructured road conditions [5].
英伟达又一新作!MPA:基于模型的闭环端到端自适应策略新框架(CMU&斯坦福等)
自动驾驶之心· 2025-12-01 00:04
Core Insights - The article discusses the Model-Based Policy Adaptation (MPA) framework aimed at enhancing the robustness and safety of end-to-end (E2E) autonomous driving agents during closed-loop evaluations [2][6][41] - MPA addresses the challenges of cascading errors and insufficient generalization capabilities in closed-loop evaluations by utilizing a model-based approach to adapt pre-trained E2E driving agents [2][6] Summary by Sections Background - E2E autonomous driving models have shown significant progress by integrating perception, prediction, and planning into a unified learning framework, but they face performance degradation in closed-loop environments due to cumulative errors and distribution shifts [3][6] - The gap between offline training and online objectives highlights the need for improved closed-loop performance evaluation [5][9] MPA Framework - MPA is designed to bridge the performance gap by generating counterfactual data using a high-fidelity 3D Gaussian splatter (3DGS) simulation engine, which allows the agent to experience diverse scenarios beyond the original dataset [7][14] - The framework includes a diffusion model-based policy adapter and a multi-step Q-value model to optimize the agent's predictions and evaluate long-term rewards [7][21] Experimental Results - MPA was validated on the nuScenes benchmark dataset, demonstrating significant performance improvements in both in-domain and out-of-domain scenarios, particularly in safety-critical situations [11][33] - The results indicate that MPA outperforms baseline models, achieving higher scores in key metrics such as route completion (RC) and HDScore [33][36] Contributions - The article outlines three main contributions: 1. Analysis of the root causes of performance decline in closed-loop evaluations and the fidelity of 3DGS simulations [11][41] 2. Development of a systematic counterfactual data generation process and training of the MPA framework [11][43] 3. Demonstration of MPA's effectiveness in enhancing the performance of E2E driving agents in various scenarios [41][43] Limitations and Future Work - The MPA framework relies on the assumption that 3DGS can provide reliable rendering under constrained trajectory deviations, which may not hold in all cases [44] - Future work will focus on expanding the dataset, integrating online reinforcement learning, and enhancing the framework's robustness in diverse driving conditions [44][46]
轻舟智航最新GuideFlow:端到端轨迹规划新方案
自动驾驶之心· 2025-11-30 02:02
Core Insights - The article discusses the development of a new planning framework called GuideFlow, which addresses the challenges of trajectory generation in end-to-end autonomous driving by incorporating explicit constraints and enhancing model optimization capabilities [3][11][49] - GuideFlow integrates various conditional signals to guide the generation process, improving the robustness and safety of autonomous driving systems [11][49] Summary by Sections Background Review - End-to-end autonomous driving (E2E-AD) has emerged as an attractive alternative to traditional modular approaches, allowing for unified training through data [9] - Recent advancements have shifted from single-modal to multi-modal trajectory generation to better reflect inherent uncertainties in real driving scenarios [9][10] GuideFlow Framework - GuideFlow explicitly models the flow matching process to alleviate mode collapse issues and flexibly integrates multiple guiding signals [3][11] - The framework combines flow matching with Energy-Based Model (EBM) training to enhance the model's ability to meet physical constraints [3][11] Experimental Results - GuideFlow demonstrated superior performance on various benchmark datasets, achieving state-of-the-art (SOTA) results, particularly on the challenging NavSim dataset with an Extended PMD Score (EPDMS) of 43.0 [3][34][37] - The framework's collision rate was notably low, with an average of 0.07% on the NuScenes dataset, showcasing its safety capabilities [40][41] Contributions and Innovations - The article highlights three core strategies within GuideFlow: speed field constraints, flow state constraints, and EBM flow optimization, which collectively enhance trajectory feasibility and safety [11][28][31] - The integration of driving aggressiveness scoring allows for dynamic adjustments in trajectory styles during inference, further refining the model's adaptability [33][49] Conclusion - GuideFlow represents a significant advancement in trajectory planning for autonomous driving, effectively embedding safety constraints into the generation process and demonstrating robust performance across various datasets [49]
轻舟智航最新!GuideFlow:端到端轨迹规划新方案,超越一众SOTA......
自动驾驶之心· 2025-11-26 00:04
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | Lin Liu等 编辑 | 自动驾驶之心 今年学术界和工业界很大的精力都投入在Action的建模上,也就是自车轨迹的输出。先前的MLP只能输出单模 的轨迹,实际使用中无法满足下游不确定性的需求。所以从去年开始,我们看到了生成式的很多算法问世。 经过这一年的发展,生成式的算法进一步收敛到Diffusion和Flow matching两个方向上。 自动驾驶之心了解到 上半年有不少公司都在尝试将这两种方法落地量产,期间坎坷无需多言。 今天为大家分享的是一篇北交&轻舟智航等团队最新的工作,提出一种基于Constrained Flow Matching的新型规 划框架 GuideFlow ,整体效果还不错。 具体而言,GuideFlow显式建模流匹配过程,该过程本质上可缓解模态坍塌的问题,并能灵活融合多种条件信 号的引导。本文的核心贡献在于, 将显式约束直接嵌入流匹配生成过程 ,而非依赖隐式约束编码。关键创新 点在于, GuideFlow将流匹配与Ene ...
博雷顿董事长陈方明:围绕“智能”发展 叩开矿山“系统智能化”大门
Zheng Quan Ri Bao Wang· 2025-11-25 03:28
Core Insights - The core viewpoint of the articles is that Boreton Technology Co., Ltd. is accelerating its development in the capital market by launching the "9M145E unmanned mining truck," which represents a significant shift towards system intelligence in mining operations [1][2]. Group 1: Product Development - The "9M145E unmanned mining truck" is designed with a focus on unmanned operation, featuring a restructured vehicle architecture that eliminates the driver's cabin and optimizes the mechanical structure and power system [1]. - This new product enhances vehicle stability and consistency, making it suitable for heavy-load, long-cycle, and all-weather mining operations [1]. - The transition from "local automation" to "system intelligence" in mining production is marked by this product launch, indicating a pivotal development in the industry [1]. Group 2: Data-Driven Operations - Boreton's scheduling system replaces traditional methods with data-driven global optimization, addressing issues like queuing, waiting, and route switching that affect productivity [2]. - The integration of real-time operational data into a smart scheduling system enhances monitoring and forecasting of operational risks, equipment loads, and work conditions [2]. - This "digital transparency" is becoming a new foundation for mining governance, reducing human uncertainty and management blind spots [2]. Group 3: Efficiency and Workforce Impact - The end-to-end autonomous driving technology achieves efficiency levels comparable to human operators, with the potential to significantly reduce the number of drivers needed [3]. - The goal is to manage hundreds of autonomous vehicles with a minimal workforce, thereby improving overall operational efficiency in mining [3]. Group 4: Industry Transformation - The smart transformation of mining is reshaping the industry structure, with equipment manufacturers evolving from hardware providers to system capability providers [3]. - There is a shift in talent demand from driving and operation roles to system management and digital governance [3]. - The integration of mechanical, energy, algorithmic, and management capabilities is essential for achieving mining automation [3]. Group 5: Future Strategy - Boreton plans to focus on intelligent vehicles, with all new models supporting unmanned driving starting next year, utilizing a drive-by-wire chassis [4]. - By 2026, all self-manufactured mining trucks will support unmanned driving, designed to lower costs and facilitate customer upgrades without the need for vehicle replacement [4]. - The company's products leverage innovative technologies such as dual-vision and full-spectrum fusion, ensuring high safety performance and low maintenance costs [4].
留给端到端和VLA的转行时间,应该不多了......
自动驾驶之心· 2025-11-25 00:03
Core Viewpoint - The article emphasizes the growing demand for skills in end-to-end and VLA (Vision-Language-Action) autonomous driving, highlighting the saturation of job opportunities in these areas and the urgency for newcomers to acquire relevant knowledge and skills quickly [1]. Course Offerings - The "End-to-End and VLA Autonomous Driving Course" is designed to provide comprehensive training in VLA, covering topics from VLM as an autonomous driving interpreter to modular and integrated VLA, and current mainstream inference-enhanced VLA [1]. - The "Autonomous Driving VLA and Large Model Practical Course" focuses on foundational theories and practical applications, including Vision/Language/Action modules, reinforcement learning, and diffusion models, with a special section on building VLA models and datasets from scratch [1]. Instructor Team - The course is led by experts from both academia and industry, including individuals with extensive research and practical experience in multimodal perception, autonomous driving VLA, and large model frameworks [6][8][11]. Target Audience - The courses are aimed at individuals with a foundational understanding of autonomous driving, familiarity with key technologies such as transformer models and reinforcement learning, and a basic knowledge of probability and linear algebra [12][13].
浙大一篇中稿AAAI'26的工作DiffRefiner:两阶段轨迹预测框架,创下NAVSIM新纪录!
自动驾驶之心· 2025-11-25 00:03
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 编辑 | 自动驾驶之心 论文作者 | Liuhan Yin等 与自动驾驶中预测自车固定候选轨迹集的判别式方法不同,扩散模型等生成式方法能够学习未来运动的潜在分布,实现更灵活的轨迹预测。然而由于这些方法通常依 赖于对人工设计的轨迹锚点或随机噪声进行去噪处理,其性能仍有较大提升空间。 浙江大学&纽劢的团队提出一种全新的两阶段轨迹预测框架DiffRefiner :第一阶段采用基于Transformer的proposal解码器,通过对传感器输入进行回归,利用预定义轨 迹锚点生成粗粒度轨迹预测;第二阶段引入扩散Refiner,对初始预测结果进行迭代去噪与优化。通过融合判别式轨迹proposal模块,本文为生成式精炼过程提供了强有 力的引导,显著提升了基于扩散模型的规划性能。此外,本文设计了细粒度去噪解码器以增强场景适应性,通过加强与周围环境的对齐,实现更精准的轨迹预测。实 验结果表明,DiffRefiner达到了当前最优性能:在NAVSIM v2数据集上达到87.4的 ...