Workflow
端到端自动驾驶
icon
Search documents
AAAI 2026 | 小鹏联合北大,专为VLA模型定制视觉token剪枝方法,让端到端自动驾驶更高效
机器之心· 2026-01-04 05:43
Core Insights - The article discusses the increasing application of VLA models in end-to-end autonomous driving systems, highlighting the challenges posed by lengthy visual tokens that significantly raise computational costs [2][8] - A new paradigm for efficient visual token pruning in autonomous driving VLA models is introduced through the paper "FastDriveVLA," co-authored by Xiaopeng Motors and Peking University [2][5] - The research proposes that visual tokens related to foreground information are more valuable than those related to background content, leading to the development of a large-scale annotated dataset, nuScenes-FG, containing 241,000 images with foreground area annotations [2][13] Summary by Sections Research Background and Issues - End-to-end autonomous driving shows great potential to transform future transportation systems, learning the entire driving process within a unified framework [6] - Existing VLA models convert visual inputs into numerous visual tokens, resulting in significant computational overhead and increased inference latency, posing challenges for real-world deployment [8] Methodology and Innovations - FastDriveVLA is a novel, reconstruction-based visual token pruning framework tailored for end-to-end autonomous driving VLA models [10] - The framework includes a lightweight, plug-and-play pruner called ReconPruner, which identifies and selects meaningful foreground visual tokens using a masked image modeling approach [16][18] - An innovative adversarial foreground-background reconstruction strategy is introduced to enhance ReconPruner's ability to distinguish between foreground and background tokens [19] Experimental Results - FastDriveVLA demonstrates state-of-the-art performance across various pruning ratios in the nuScenes open-loop planning benchmark [20][25] - When the number of visual tokens is reduced from 3,249 to 812, FastDriveVLA achieves a reduction in FLOPs by approximately 7.5 times and significantly improves CUDA inference latency [26] - The framework outperforms existing methods, particularly at a 50% pruning ratio, achieving a balanced performance across all metrics [25] Efficiency Analysis - FastDriveVLA's efficiency is highlighted by its substantial reduction in FLOPs and CUDA latency, showcasing its potential for real-time applications in autonomous driving [26][27] - At a 25% pruning rate, FastDriveVLA shows the best performance across all evaluation metrics, indicating that focusing on foreground-related visual tokens is crucial for enhancing autonomous driving performance [28]
为什么蔚来会押注世界模型?
自动驾驶之心· 2026-01-04 01:04
Core Insights - NIO's NWM 2.0 launch has reportedly shown promising results, with expectations for the world model to deliver surprises in intelligent driving [1] - The concept of the world model is crucial for understanding spatiotemporal cognition, which is essential for autonomous driving systems [1] Group 1: World Model Concept - The world model focuses on high-bandwidth cognitive systems that directly utilize video data rather than converting it into language, addressing the limitations of language models in modeling real-world spatiotemporal dynamics [1] - The world model encompasses two levels of cognition: spatiotemporal understanding and conceptual understanding, with the former being critical for autonomous driving applications [1] Group 2: Industry Applications and Challenges - Various companies are building their own cloud and vehicle-based world models using open-source algorithms for data generation and closed-loop simulation [1] - The definition of a world model remains ambiguous, leading to confusion among newcomers in the field, who often struggle to grasp the concept and its applications [1] Group 3: Course Overview - A course is being offered to help individuals understand the world model in autonomous driving, covering topics from foundational principles to practical applications [6][11] - The course includes multiple chapters focusing on the history, background knowledge, and various streams of world models, including pure simulation and generative models [6][7][8] Group 4: Technical Foundations - The course will cover essential technical concepts such as Transformer architecture, BEV perception, and occupancy networks, which are critical for understanding world models [12][14] - Participants are expected to have a foundational knowledge of autonomous driving modules and relevant programming skills to fully benefit from the course [14]
2026年,这个自驾社区计划做这些事情......
自动驾驶之心· 2026-01-02 08:08
Core Viewpoint - The article emphasizes the establishment of a comprehensive community for autonomous driving, aiming to provide a platform for knowledge sharing, technical discussions, and career opportunities in the field [4][17]. Group 1: Community Development - The "Autonomous Driving Heart Knowledge Planet" has been created to address the high trial-and-error costs for newcomers in the autonomous driving industry, offering a structured learning environment [4][5]. - The community has grown to over 4,000 members and aims to expand to nearly 10,000 within two years, focusing on both academic and industrial needs [5][18]. - Various activities such as face-to-face meetings, expert interviews, and industry research will continue to be organized to meet the diverse needs of members [4][5][18]. Group 2: Learning Resources - The community has compiled over 40 technical learning paths, covering topics from entry-level to advanced autonomous driving technologies [7][18]. - Members have access to exclusive video tutorials and documents that facilitate learning in areas such as perception fusion, SLAM, and decision-making [11][18]. - A comprehensive list of open-source projects and datasets related to autonomous driving has been made available to assist members in their research and projects [35][37]. Group 3: Industry Insights - The community plans to conduct industry research focusing on the scaling of autonomous driving technologies, particularly in the L4 domain, which is expected to regain attention in the coming year [4][18]. - Regular discussions with industry experts will provide insights into the latest trends, challenges, and opportunities in the autonomous driving sector [7][18]. - The community aims to connect members with job opportunities in leading companies within the autonomous driving industry, facilitating career advancement [11][20].
中国智能驾驶产业的算力巨变
3 6 Ke· 2025-12-30 10:36
Core Insights - In 2025, the Chinese smart driving industry is experiencing an unprecedented shift in computing power, driven by the evolution of software algorithms and the emergence of competing technical paradigms [1][2] - The differentiation in high-level intelligent driving commercial applications is evident, with a K-shaped market split between affordable and high-end models, leading to fragmentation in the industry [2] - The demand for computing power is increasingly recognized as a core element in the development of smart driving technologies, both at the vehicle and cloud levels [2] Group 1: Technological Evolution - The transition to an end-to-end framework in smart driving is marked by significant advancements, as seen in Tesla's FSD Beta V12 software, which utilizes a computing power standard of 144 TOPS [3][4] - Tesla's shift from HW3 to HW4 signifies a major milestone in its autonomous driving evolution, with the latter becoming the preferred platform for future software updates [5][6] - The upcoming FSD V14 version is expected to have ten times the parameters of its predecessor, indicating a substantial leap in the vehicle's ability to process complex environmental information [6] Group 2: Market Dynamics - Chinese smart driving players, including Xpeng, Li Auto, and NIO, are adopting end-to-end strategies but are initially relying on existing computing platforms, primarily NVIDIA's Orin-X [7][12] - By 2025, a clear division among smart driving companies has emerged, categorized into three main factions based on their computing power strategies: self-developed chips, NVIDIA-based solutions, and Huawei's offerings [12][13] - The self-developed chip faction includes NIO's NX9031 and Xpeng's Turing AI chip, while the NVIDIA faction is represented by the latest Thor platform, which is gaining traction in various models [13][14] Group 3: Cloud Computing and Future Prospects - The industry is witnessing a race for cloud computing power, which is essential for the evolution of smart driving algorithms and the transition from L2 to L4 capabilities [19][20] - The reliance on cloud computing is becoming increasingly critical, as it supports data processing, model training, and simulation necessary for addressing complex driving scenarios [23][24] - The ongoing competition for cloud resources is expected to intensify, with companies recognizing that enhanced cloud capabilities are vital for future advancements in autonomous driving technology [20][21]
摸底地平线HSD一段式端到端的方案设计
自动驾驶之心· 2025-12-30 00:28
Core Insights - The article discusses two core papers from Horizon Robotics: DiffusionDrive and ResAD, focusing on their contributions to end-to-end autonomous driving solutions [2][3]. DiffusionDrive - The overall architecture of DiffusionDrive consists of three parts: perception information, navigation information, and trajectory generation [6]. - Perception information includes dynamic/static obstacles, traffic lights, map elements, and drivable areas, emphasizing the need to convey perception tasks to planning tasks in an end-to-end manner [6]. - Navigation information is crucial for avoiding incorrect routes, especially in complex urban environments like Shanghai, where navigation challenges are significant [7]. - The core concept of trajectory generation is "Truncated Diffusion," which leverages fixed patterns in human driving behavior to reduce training convergence difficulty and inference noise [8][10]. - The article outlines a method for trajectory generation using K-Means clustering to describe common human driving behaviors, which simplifies the training process [9]. ResAD - ResAD introduces a residual design that predicts the difference between future trajectories and inertial extrapolated trajectories, rather than generating future trajectories directly [12]. - The residual regularization helps manage the increasing residuals over time, ensuring that the model focuses on the true diversity of driving behaviors [13][14]. - The design allows for different noise perturbations in the trajectory generation process, adjusting learning difficulty based on the direction of motion [15]. - ResAD also features a trajectory ranker that utilizes a transformer model to predict metric scores based on top-k trajectory predictions and environmental information [16]. Conclusion - Both papers from Horizon Robotics provide valuable insights and methodologies for enhancing autonomous driving systems, encouraging further exploration and development in the field [18].
刷新NAVSIM SOTA!端到端自动驾驶新框架Masked Diffusion
自动驾驶之心· 2025-12-26 03:32
来源 | 机器之心 原文链接: 刷新NAVSIM SOTA,复旦引望提出Masked Diffusion端到端自动驾驶新框架 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 随着 VLA(Vision-Language-Action)模型的兴起,端到端自动驾驶正经历从「模块化」向「大一统」的范式转移。然而,将感知、推理与规划压缩进单一模型 后,主流的自回归(Auto-regressive)生成范式逐渐显露出局限性。现有的自回归模型强制遵循「从左到右」的时序生成逻辑,这与人类驾驶员的思维直觉存在本 质差异 —— 经验丰富的驾驶员在处理复杂路况时,往往采用「以终为始」的策略,即先确立长期的驾驶意图(如切入匝道、避让行人、靠边停靠),再反推当前 的短期操控动作。此外,基于模仿学习的模型容易陷入「平均司机」陷阱,倾向于拟合数据分布的均值,导致策略平庸化,难以在激进博弈与保守避让之间灵活切 换。 针对上述痛点, 复旦大学与引望智能联合提出了 WAM-Diff 框架 。该研究创新 ...
刷新NAVSIM SOTA,复旦提出端到端自动驾驶新框架
具身智能之心· 2025-12-26 00:55
Core Insights - The article discusses the transition in end-to-end autonomous driving from a modular approach to a unified paradigm with the rise of Vision-Language-Action (VLA) models, highlighting the limitations of existing autoregressive models in mimicking human driving intuition [1][2]. Group 1: WAM-Diff Framework - The WAM-Diff framework, developed by Fudan University and Yiwang Intelligence, introduces a Discrete Masked Diffusion model for VLA autonomous driving planning, integrating a sparse mixture of experts (MoE) architecture and online reinforcement learning (GSPO) [2][4]. - WAM-Diff achieved state-of-the-art (SOTA) performance on the NAVSIM benchmark, scoring 91.0 PDMS and 89.7 EPDMS, demonstrating the potential of non-autoregressive generation in complex driving scenarios [2][16][18]. Group 2: Technical Innovations - WAM-Diff employs Hybrid Discrete Action Tokenization to convert continuous 2D trajectory coordinates into high-precision discrete tokens, allowing for a shared vocabulary with driving commands [5]. - The framework utilizes Masked Diffusion for generation, enabling parallel prediction of all token positions, which enhances inference efficiency and allows for global optimization [5][9]. Group 3: Decoding Strategies - WAM-Diff explores three decoding strategies: causal, reverse-causal, and random, finding that the reverse-causal strategy yields the best performance in closed-loop metrics, aligning with the "end-to-begin" planning intuition [9][20]. - This approach confirms that establishing long-term driving intentions before detailing immediate actions significantly improves planning consistency and safety [9][20]. Group 4: MoE and GSPO Integration - The MoE architecture within WAM-Diff includes 64 lightweight experts, dynamically activated based on the driving context, enhancing model capacity and adaptability while controlling computational costs [12]. - The GSPO algorithm bridges the gap between open-loop training and closed-loop execution, optimizing trajectory sequences based on safety, compliance, and comfort metrics [12][14]. Group 5: Experimental Results - In extensive experiments on the NAVSIM benchmark, WAM-Diff outperformed several leading models, achieving a PDMS score of 91.0 and an EPDMS score of 89.7, indicating its robustness in balancing safety and compliance [16][18]. - The model's performance in NAVSIM-v2, which includes stricter metrics for traffic rule adherence and comfort, improved by 5.2 points over the previous best, showcasing its capability in real-world driving scenarios [18]. Group 6: Conclusion - WAM-Diff represents a significant advancement in autonomous driving planning, moving towards a discrete, structured, and closed-loop approach, emphasizing the importance of both "how to generate" and "what to generate" in the VLA era [25].
刚做了一份世界模型的学习路线图,面向初学者......
自动驾驶之心· 2025-12-25 03:24
Core Viewpoint - The article discusses the distinction between world models and end-to-end models in autonomous driving, clarifying that world models are not a specific technology but rather a category of models with certain capabilities. It emphasizes the trend in the industry towards using world models for closed-loop simulation to address the high costs associated with corner cases in autonomous driving [2]. Course Overview - The course on world models in autonomous driving is structured into six chapters, covering the introduction, background knowledge, discussions on general world models, video generation-based models, OCC-based models, and job-related insights in the industry [5][6][7][8][9]. Chapter Summaries - **Chapter 1: Introduction to World Models** This chapter outlines the relationship between world models and end-to-end autonomous driving, discussing the development history and current applications of world models, as well as various streams such as pure simulation, simulation plus planning, and generating sensor inputs [5]. - **Chapter 2: Background Knowledge** This chapter covers foundational knowledge related to world models, including scene representation, Transformer technology, and BEV perception, which are crucial for understanding subsequent chapters [6]. - **Chapter 3: General World Models** Focuses on popular general world models like Marble from Li Fei-Fei's team and Genie 3 from DeepMind, discussing their core technologies and design philosophies [7]. - **Chapter 4: Video Generation-Based World Models** This chapter delves into video generation algorithms, starting with GAIA-1 & GAIA-2 and extending to recent works like UniScene and OpenDWM, highlighting both classic and cutting-edge advancements in this area [8]. - **Chapter 5: OCC-Based World Models** Concentrates on OCC generation algorithms, discussing three major papers and a practical project, emphasizing the potential for these methods to extend into vehicle trajectory planning [9]. - **Chapter 6: World Model Job Topics** This chapter shares practical insights from the instructor's experience, addressing industry applications, pain points, and interview preparation for positions related to world models [9]. Learning Outcomes - The course aims to provide a comprehensive understanding of world models in autonomous driving, equipping participants with the knowledge to achieve a level comparable to one year of experience as a world model algorithm engineer [10].
刷新NAVSIM SOTA,复旦引望提出Masked Diffusion端到端自动驾驶新框架
机器之心· 2025-12-25 03:12
Core Insights - The article discusses the transition in end-to-end autonomous driving from a "modular" approach to a "unified" paradigm with the rise of Vision-Language-Action (VLA) models, highlighting the limitations of existing autoregressive generation paradigms [2] - It introduces the WAM-Diff framework, which innovatively incorporates discrete masked diffusion models into VLA autonomous driving planning, addressing the challenges of single-direction temporal generation [2][6] Group 1: WAM-Diff Framework - WAM-Diff utilizes Hybrid Discrete Action Tokenization to convert continuous 2D trajectory coordinates into high-precision discrete tokens, achieving an error control within 0.005 [6] - The framework employs Masked Diffusion as its backbone, allowing for parallel prediction of all token positions, significantly enhancing inference efficiency and enabling global optimization [6] - WAM-Diff explores decoding strategies, revealing that the reverse-causal strategy outperforms others in closed-loop metrics, validating the "end-to-begin" planning logic [9][20] Group 2: Performance Metrics - In the authoritative NAVSIM benchmark, WAM-Diff achieved state-of-the-art (SOTA) scores of 91.0 PDMS in NAVSIM-v1 and 89.7 EPDMS in NAVSIM-v2, demonstrating its potential in complex autonomous driving scenarios [3][18] - The model surpassed competitors like DiffusionDrive and ReCogDrive, indicating its robustness in balancing safety and compliance in real-world driving conditions [18] Group 3: Technical Innovations - WAM-Diff integrates a Low-Rank Adaptation Mixture-of-Experts (LoRA-MoE) architecture, which includes 64 lightweight experts for dynamic routing and sparse activation, enhancing model capacity and adaptability [11] - The Group Sequence Policy Optimization (GSPO) algorithm is introduced to bridge the gap between open-loop training and closed-loop execution, optimizing trajectory sequences based on safety, compliance, and comfort metrics [14] Group 4: Conclusion - The emergence of WAM-Diff marks a significant step towards discrete, structured, and closed-loop autonomous driving planning, emphasizing the importance of both "how to generate" and "what to generate" in the VLA era [25]
下周开课!我们设计了一份自动驾驶世界模型学习路线图....
自动驾驶之心· 2025-12-24 09:22
Core Viewpoint - The article discusses the distinction between world models and end-to-end models in autonomous driving, emphasizing that world models are a means to achieve end-to-end autonomous driving rather than a specific technology [2]. Summary by Sections Chapter 1: Introduction to World Models - This chapter provides an overview of the relationship between world models and end-to-end autonomous driving, covering the development history and current applications of world models. It introduces various types of world models, including pure simulation, simulation plus planning, and those generating sensor inputs and perception results, along with their industry applications and relevant datasets [5]. Chapter 2: Background Knowledge of World Models - The second chapter focuses on the foundational knowledge necessary for understanding world models, starting with scene representation and expanding to technologies like Transformer and BEV perception. It highlights key technical terms frequently encountered in job interviews related to world models [6][11]. Chapter 3: Discussion on General World Models - This chapter centers on general world models and recent popular works in autonomous driving, including models from Li Fei-Fei's team (Marble), DeepMind (Genie 3), and Meta (JEPA). It also discusses the widely talked-about VLA+ world model algorithms and Tesla's latest world model simulator shared at ICCV [7]. Chapter 4: Video Generation-Based World Models - The fourth chapter focuses on video generation algorithms, which are currently the most researched in both academia and industry. It covers classic works like GAIA-1 & GAIA-2 from Wayve and recent advancements such as UniScene and OpenDWM, providing a comprehensive view of the field's progress [8]. Chapter 5: OCC-Based World Models - This chapter discusses OCC generation algorithms, explaining three major papers and a practical project. These methods can be easily extended for vehicle trajectory planning, contributing to end-to-end solutions [9]. Chapter 6: World Model Job Topics - The final chapter shares practical insights from the instructor's years of experience, addressing the application of world models in the industry, existing pain points, and how to prepare for related job interviews, focusing on what companies prioritize [10]. Course Outcomes - The course aims to advance understanding of end-to-end autonomous driving, equipping participants with knowledge of world model technologies, including video generation and OCC generation methods, and preparing them for roles in the autonomous driving industry [10][13].