Workflow
自动驾驶之心
icon
Search documents
AI Day直播 | 自动驾驶中的渐进鲁棒世界模型全面盘点(一作分享)
自动驾驶之心· 2026-01-07 01:07
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>直播和内容获取转到 → 自动驾驶之心知识星球 点击按钮预约直播 驾驶世界模型(DWM)因其能够显式建模车辆动力学特性、将多模态传感器输入融合为统一表征,并支持长时序推理的 核心能力,已引发学界与业界的广泛关注——为提升自动驾驶系统的安全性与鲁棒性展现出巨大潜力。 为此, 北京交通大学联合澳门大学、哈工大、新加坡南洋理工、清华、北航、小米汽车及 昆士兰大学提出了 自动驾驶中 的渐进鲁棒性感知世界模型综述。 本综述以鲁棒性为核心视角,对DWM进行了全面梳理: 首先概述了DWM的基础原理 及其在自动驾驶中的独特价值,随后按技术范式、架构设计及下游应用场景,对现有方法进行了体系化分类;进而创新 性地提出递进式鲁棒性分析框架,将DWM鲁棒性的发展历程划分为三个明确阶段(鲁棒性1.0至鲁棒性3.0)。 论文链接 : https://doi.org/10.36227/techrxiv.176523308.84756413/v1 分享介绍 今天,自动驾驶之心非常荣幸邀请到本文一作 北交贾飞阳博士 为大家分享驾驶世界模型的 ...
英伟达Alpamayo再进化!反事实推理VLA,安全性能提升很可观
自动驾驶之心· 2026-01-07 01:07
Core Insights - The article discusses the development of the Counterfactual Vision-Language-Action (CF-VLA) model, which incorporates self-reflective reasoning to enhance the safety and accuracy of autonomous driving systems [3][54]. - CF-VLA aims to address the limitations of existing Vision-Language-Action (VLA) models by enabling them to reflect on their planned actions and make necessary adjustments before execution [10][54]. Group 1: Model Development - CF-VLA introduces a self-reflective reasoning loop that allows the model to analyze and correct its planned actions based on potential outcomes [10][54]. - The model generates time-segmented meta-actions to summarize driving intentions and performs counterfactual reasoning to identify unsafe behaviors [3][10]. - A "rollout-filter-label" data processing pipeline is designed to extract high-value scenarios from the model's rollout results, enhancing the training process [11][15]. Group 2: Performance Improvements - Experiments show that CF-VLA improves trajectory accuracy by up to 17.6% and safety metrics by 20.5% compared to baseline models [14][54]. - The model demonstrates adaptive reasoning capabilities, activating counterfactual reasoning primarily in complex scenarios, thus optimizing computational resources [16][54]. - The integration of counterfactual reasoning transforms the model's reasoning from descriptive to causal self-correction, significantly enhancing its decision-making process [15][54]. Group 3: Data Utilization - The training dataset includes approximately 11.6 million 20-second video clips, providing a diverse range of driving behaviors [8][35]. - The meta-action training set consists of 433,000 20-second clips and 801,000 8.4-second samples, with a validation set of 39,000 video clips [8][35]. - The counterfactual reasoning dataset typically contains 200,000 samples, which are crucial for training the model's reflective capabilities [8][35]. Group 4: Experimental Results - The CF-VLA model was evaluated on a large proprietary dataset comprising 80,000 hours of human driving data from 25 countries, covering various driving conditions [35][36]. - Key performance metrics include minimum average displacement error (MinADE), minimum final displacement error (MinFDE), and collision rates, which indicate the model's effectiveness in real-world scenarios [37][41]. - The results indicate that CF-VLA consistently outperforms traditional models in both trajectory accuracy and safety, demonstrating the effectiveness of its self-reflective reasoning approach [42][45].
开年收到了很多同学关于自驾方向选择的咨询......
自动驾驶之心· 2026-01-06 09:17
Core Insights - The article emphasizes the importance of deep learning in the fields of automation and computer science, particularly for students in these areas to explore cutting-edge topics such as VLA, end-to-end learning, and world models [2][3] - It highlights the need for newcomers to engage with research papers and discussions to develop their own ideas and methodologies [2] - The article introduces a paper guidance service aimed at assisting students with various aspects of research paper writing and publication [3][4][6] Group 1 - The article suggests that students from computer science and automation backgrounds should focus on deep learning, with specific recommendations for topics like VLA, end-to-end learning, and world models [2] - For mechanical and vehicle engineering students, it recommends starting with traditional PnC and 3DGS due to their lower computational requirements and ease of entry [2] - The article encourages new researchers to learn from failures and emphasizes the importance of developing personal insights through extensive reading and communication [2] Group 2 - The paper guidance service offers support in selecting research topics, full process guidance, and experimental assistance [6] - The service has a high acceptance rate for papers submitted to top conferences and journals, including CVPR, AAAI, and ICLR [7] - Pricing for the guidance service varies based on the level of the paper, and further details can be obtained by contacting the research assistant [8]
告别2025!业内头部公司2025年硬核工作总结(地平线/理想/英伟达等)
自动驾驶之心· 2026-01-06 09:17
Core Insights - The article discusses the evolution of autonomous driving technology in 2025, marking a transition from research to practical implementation, with significant advancements in various technical areas [2][3]. Group 1: Industry Trends - The year 2025 is characterized as a turning point for autonomous driving, with technologies like BEV perception, multi-sensor fusion, and trajectory prediction reaching maturity [2]. - The competition in the smart electric vehicle sector is intensifying, with companies like Horizon, Xiaomi, and Li Auto making notable advancements [4][22]. Group 2: Company Highlights - Horizon has made significant strides with its HSD technology, showcasing high potential in end-to-end solutions and innovative approaches like GoalFlow and ResAD [9]. - Xiaomi's autonomous driving development has rapidly progressed, with a team exceeding 1000 members and a series of iterative improvements leading to the release of HAD enhanced version [10][11]. - Li Auto has established itself in the domestic autonomous driving tier, although it faces challenges in transitioning from range-extended to pure electric vehicles [13]. - Xiaopeng Motors experienced a rebound in sales, doubling its volume to nearly 430,000 units in 2025, driven by the successful launch of VLA 2.0 technology [14]. - Bosch is actively investing in both research and production lines, focusing on end-to-end solutions and enhancing its engineering capabilities [16]. Group 3: Future Outlook - The competition in the smart electric vehicle market is expected to become more fierce in 2026, with a shift towards L3 and L4 autonomous driving technologies gaining traction [22][23].
简历直推 | 清华大学全国重点实验室招聘工程师/博后/实习生(世界模型/重建/感知等)
自动驾驶之心· 2026-01-06 06:52
自动驾驶车端世界模型方向 招工程师/博后/实习生 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 清华大学智能绿色车辆与交通全国重点实验室招聘工程师/博后/实习生,感兴趣的可以联系柱哥投递简历或邮 箱自行投递简历。 【岗位目标】 面向端到端自动驾驶核心技术需求,从事车端世界模型的研究与工程化落地。构建融合物理先验、时序一致 性与行为预测能力的世界模型架构,实现复杂驾驶场景的理解、预测与生成,支撑自动驾驶系统的感知、预 测、规划一体化能力建设,推动端到端自动驾驶技术的工程化应用。 【核心职责及次要职责】 核心职责: 次要职责: 1. 研究与开发车端世界模型核心架构,融合物理先验、因果推理、时序一致性与行为预测能力; 2. 构建驾驶场景时空表征与预测模型,实现交通参与者行为预测、场景演化推理与长期规划; 3. 研发基于Transformer、Diffusion、Neural Fields等前沿架构的场景生成与仿真模型; 4. 设计多模态输入融合方案,实现图像、点云、地图、轨迹等多源信息的统一编码与推理; 5. 完成世界模型在车端平台的部署优化,满足实时性与资源 ...
答应大家的《自动驾驶世界模型》课程终于开课了!
自动驾驶之心· 2026-01-06 06:52
Core Viewpoint - The article announces the launch of a new course titled "World Models and Autonomous Driving Small Class," focusing on general world models, video generation, and OCC generation algorithms in the context of autonomous driving [1][3]. Course Overview - The course is developed in collaboration with industry leaders and follows the success of a previous course on end-to-end and VLA autonomous driving [1]. - The course aims to enhance understanding of world models and their applications in autonomous driving, targeting individuals interested in entering the industry [11]. Course Structure Chapter 1: Introduction to World Models - This chapter provides an overview of world models and their connection to end-to-end autonomous driving, including historical development and current applications [6]. - It discusses various types of world models, such as pure simulation, simulation + planning, and generating sensor inputs and perception results, along with their industry applications [6]. Chapter 2: Background Knowledge of World Models - The second chapter covers foundational knowledge related to world models, including scene representation, Transformer technology, and BEV perception [6][12]. - It highlights key technical terms frequently encountered in job interviews related to world models [7]. Chapter 3: Discussion on General World Models - This chapter focuses on popular general world models, including Marble from Li Fei-Fei's team, DeepMind's Genie 3, and Meta's JEPA, as well as the VLA+ world model algorithms [7]. - It aims to explain the core technologies and design philosophies behind these models [7]. Chapter 4: Video Generation-Based World Models - The fourth chapter delves into video generation algorithms, starting with Wayve's GAIA-1 & GAIA-2 and extending to recent works like UniScene and OpenDWM [8]. - It balances classic works with the latest advancements in the field [8]. Chapter 5: OCC-Based World Models - This chapter focuses on OCC generation algorithms, discussing three major papers and a practical project that extends OCC methods to vehicle trajectory planning [9]. Chapter 6: World Model Job Topics - The final chapter shares practical insights from the instructor's years of experience, addressing industry applications, pain points, and interview preparation for related positions [10]. Learning Outcomes - The course is designed to be the first advanced practical tutorial for end-to-end autonomous driving, aiming to facilitate the implementation of these technologies in the industry [11]. - Participants are expected to achieve a level equivalent to one year of experience as a world model autonomous driving algorithm engineer upon completion [14].
李弘扬团队最新!SimScale:显著提升困难场景的端到端仿真框架......
自动驾驶之心· 2026-01-06 00:28
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>直播和内容获取转到 → 自动驾驶之心知识星球 点击按钮预约直播 李弘扬老师团队的新工作 - SimScale,中科院、港大OpenDriveLab和小米汽车联合完成。 近年来,大模型领域背靠 Data Scaling 取得了前所未有的突破,但到了自动驾驶,这套方法却突然失灵了。不是因为模型不够大,而是现实世界根本给 不了足够多的关键场景。 现实道路中的绝大多数驾驶片段都是重复而安全的"常态行为",真正决定策略能力上限的高风险、长尾、极端场景却往 往难以遇见,更难以大规模收集。因此自动驾驶不是缺数据,而是缺"对的"数据,行业亟需一种能系统性生成大量关键 场景、并规模化训练的新路径。 针对这些问题,SimScale应运而生,SimScale探索了在scalable的3DGS交互式仿真下,生成reward、recovery等多种数据, 进行联合训练以最大化现有训练数据的利用效率。 最终在NavSim leaderboard 上取得了新的 SOTA,并在多类主流 E2E planner 上带来了显著提升! 今天自 ...
田渊栋的2025年终总结:关于被裁和26年的研究方向
自动驾驶之心· 2026-01-06 00:28
Core Insights - The article discusses the complexities and challenges faced by the company in the context of project management and personal career decisions, particularly in the realm of AI and machine learning research [3][4][5]. Group 1: Project Management and Challenges - The company faced significant pressure when asked to assist with the Llama4 project, leading to a complex decision-making scenario that involved weighing potential outcomes and personal integrity [3]. - Despite the challenges, the company made progress in core areas of reinforcement learning, including training stability and model architecture design, which contributed to a shift in research perspectives [3]. Group 2: Career Decisions and Transitions - After over a decade with the company, there was contemplation about leaving, influenced by economic and personal factors, but ultimately a decision was made to stay, reflecting the difficulty of such transitions [4]. - The experience of navigating through ups and downs in the workplace provided valuable material for future creative endeavors, indicating a blend of professional and personal growth [5]. Group 3: Research Directions - The company is focusing on two main research directions for 2025: large model inference and understanding the "black box" of models, which has gained traction following the release of their continuous latent space reasoning work [6]. - Efforts to improve inference efficiency include various innovative approaches, such as using discrete tokens and parallel reasoning chains, which have shown promising results in reducing computational costs while enhancing performance [7]. Group 4: Interpretability and Future Directions - The company emphasizes the importance of interpretability in AI, arguing that understanding how AI systems work is crucial for ensuring ethical and effective use of technology [10]. - Current efforts to demystify model training processes are still in early stages, with a focus on deriving principles from first principles to guide future AI model design [11].
L4数据闭环总结 | 面向物理 AI 时代的数据基础设施
自动驾驶之心· 2026-01-06 00:28
Core Viewpoint - The article emphasizes that in the pursuit of general physical intelligence, the model serves as the ceiling while the data infrastructure acts as the floor, highlighting the importance of both elements working in tandem to create a competitive barrier [2]. Group 1: Shift in Talent Demand - There has been a noticeable shift in the automatic driving and AI sectors, with a growing emphasis on recruiting talent for "data infrastructure" [3]. - Leading companies like Tesla and Wayve are focusing on extracting data from large-scale fleets to build automatic scoring systems rather than relying solely on manually written rules [4]. - The consensus is that while model algorithms are becoming rapidly replaceable, the foundational infrastructure for data extraction and defining quality remains a significant competitive advantage once established [6]. Group 2: Evolution of Physical AI - The article outlines three evolutionary stages of "Physical AI" using references from popular anime, illustrating the progression from early simulation to advanced world models [8]. - The first stage involves basic simulation and remote teaching, while the second stage incorporates augmented reality, overlaying virtual elements onto the real world [10][12]. - The third stage envisions a world model where AI can train in accelerated time, significantly enhancing learning efficiency [14]. Group 3: Data Infrastructure and World Models - The construction of a robust data infrastructure is essential for translating the chaotic physical world into a comprehensible format for world models [16]. - The article discusses various layers of data processing, including metrics for physical world perception, data classification, and automated evaluation systems [17][21][23]. - The ultimate goal is to create a closed-loop system where real-world data informs and refines AI training, enabling rapid iteration and improvement [18][20]. Group 4: Future of Physical AI - The transition from a "Bug Driven" approach to a "Data Driven" model is crucial for the advancement of physical AI [24]. - The article argues that while models may evolve quickly, the foundational infrastructure for data collection and processing will remain invaluable [27]. - The future development of AI will likely rely on a symbiotic relationship between world models as generators and data infrastructure as discriminators, ensuring that AI systems are grounded in reality [36][38].
拆解理想在世界模型方向的工作
自动驾驶之心· 2026-01-05 09:30
Core Insights - The article discusses the advancements and applications of world models in autonomous driving, particularly focusing on the reconstruction and generation techniques utilized by companies like Li Auto [2][3] - It highlights the importance of understanding world models for newcomers in the field, emphasizing the challenges faced in grasping the concepts and practical applications [4][5] Summary by Sections Section 1: Introduction to World Models - The first chapter provides an overview of world models and their connection to end-to-end autonomous driving, detailing the historical development and current applications [7] - It categorizes different types of world models, including purely simulated models, simulation combined with planning, and those generating sensor inputs and perception results [7] Section 2: Background Knowledge of World Models - The second chapter covers foundational knowledge related to world models, including scene representation, Transformer technology, and BEV perception [8][13] - It emphasizes the significance of these concepts in preparing for advanced discussions on world models [8] Section 3: General World Model Exploration - The third chapter focuses on general world models and recent popular works in autonomous driving, discussing models like Marble, Genie 3, and DriveVLA-W0 [9] Section 4: Video Generation-Based World Models - The fourth chapter delves into video generation algorithms, which are currently the most researched area in both academia and industry, starting with notable works like GAIA-1 & GAIA-2 [10] Section 5: OCC-Based World Models - The fifth chapter centers on OCC generation methods, explaining their potential for extending to vehicle trajectory planning and achieving end-to-end solutions [10] Section 6: World Model Job Topics - The sixth chapter shares practical insights from industry experience, addressing the application of world models, industry pain points, and interview preparation for related positions [11] Course Overview - The course aims to provide a comprehensive understanding of world models, targeting individuals interested in advancing their knowledge and skills in autonomous driving technology [12][15] - It includes a structured schedule with specific topics covered in each chapter, starting from foundational concepts to advanced applications [16][17]