Workflow
自动驾驶之心
icon
Search documents
刚做了一份世界模型的学习路线图,面向初学者......
自动驾驶之心· 2025-12-25 03:24
Core Viewpoint - The article discusses the distinction between world models and end-to-end models in autonomous driving, clarifying that world models are not a specific technology but rather a category of models with certain capabilities. It emphasizes the trend in the industry towards using world models for closed-loop simulation to address the high costs associated with corner cases in autonomous driving [2]. Course Overview - The course on world models in autonomous driving is structured into six chapters, covering the introduction, background knowledge, discussions on general world models, video generation-based models, OCC-based models, and job-related insights in the industry [5][6][7][8][9]. Chapter Summaries - **Chapter 1: Introduction to World Models** This chapter outlines the relationship between world models and end-to-end autonomous driving, discussing the development history and current applications of world models, as well as various streams such as pure simulation, simulation plus planning, and generating sensor inputs [5]. - **Chapter 2: Background Knowledge** This chapter covers foundational knowledge related to world models, including scene representation, Transformer technology, and BEV perception, which are crucial for understanding subsequent chapters [6]. - **Chapter 3: General World Models** Focuses on popular general world models like Marble from Li Fei-Fei's team and Genie 3 from DeepMind, discussing their core technologies and design philosophies [7]. - **Chapter 4: Video Generation-Based World Models** This chapter delves into video generation algorithms, starting with GAIA-1 & GAIA-2 and extending to recent works like UniScene and OpenDWM, highlighting both classic and cutting-edge advancements in this area [8]. - **Chapter 5: OCC-Based World Models** Concentrates on OCC generation algorithms, discussing three major papers and a practical project, emphasizing the potential for these methods to extend into vehicle trajectory planning [9]. - **Chapter 6: World Model Job Topics** This chapter shares practical insights from the instructor's experience, addressing industry applications, pain points, and interview preparation for positions related to world models [9]. Learning Outcomes - The course aims to provide a comprehensive understanding of world models in autonomous driving, equipping participants with the knowledge to achieve a level comparable to one year of experience as a world model algorithm engineer [10].
这篇大家关注很多的DriveVLA-W0,一作分享
自动驾驶之心· 2025-12-25 03:24
Core Viewpoint - The article emphasizes the rapid advancements and potential of the autonomous driving industry, highlighting various directions for learning and development in this field [1] Group 1 - The article suggests that there are nearly 30 different directions for learning about autonomous driving, indicating a broad scope of opportunities for research and investment [1]
爆发的L4,得抓住这波风口......
自动驾驶之心· 2025-12-25 03:24
自动驾驶之心L4交流群来了,关注L4赛道融资、技术进展、RoboTaxi、RoboBus、RoboVan、无人配送、无人 矿卡、无人重卡等方向~ 添加小助理微信AIDriver005,备注:昵称+机构/学校+进群。 ...
自动驾驶L4技术交流群来了~
自动驾驶之心· 2025-12-24 09:22
自动驾驶之心L4交流群来了,关注L4赛道融资、技术进展、RoboTaxi、RoboBus、RoboVan、无人配送、无人 矿卡、无人重卡等方向~ 添加小助理微信AIDriver005,备注:昵称+机构/学校+进群。 ...
业内首个RL+VLA汇总:强化学习如何推动 VLA 走向真实世界?
自动驾驶之心· 2025-12-24 09:22
Core Insights - The article discusses advancements in Vision-Language-Action (VLA) models for autonomous driving, highlighting a shift from traditional supervised learning methods to reinforcement learning (RL) approaches to enhance model generalization and reasoning capabilities [2]. Summary by Sections VLA + RL Research Overview - The article summarizes recent works in the VLA + RL domain, indicating a trend towards using RL to address limitations in previous models, particularly in terms of hallucination issues and the efficiency of continuous action space exploration [2]. Key Papers and Contributions - **MindDrive**: Introduces a framework that transforms action space into a discrete language decision space, achieving a driving score of 78.04 and a success rate of 55.09% on the Bench2Drive benchmark using a lightweight model [6]. - **WAM-Diff**: Proposes an end-to-end VLA framework that utilizes masked diffusion for trajectory optimization, achieving superior performance on the NAVSIM benchmark [7]. - **LCDrive**: Addresses temporal expression and latency issues in text chain reasoning by employing a latent chain-of-thought mechanism, demonstrating improved reasoning efficiency and trajectory quality [12]. - **Reasoning-VLA**: Develops a framework that enhances parallel trajectory generation through learnable action queries, achieving high performance across multiple datasets [13]. - **Alpamayo-R1**: Bridges reasoning and action prediction through a modular architecture and multi-stage training, improving generalization in long-tail scenarios [18]. - **AdaThinkDrive**: Introduces a dual-mode mechanism to balance decision accuracy and reasoning efficiency, achieving a PDMS score of 90.3 on the Navsim benchmark [20]. - **AutoDrive-R²**: Combines supervised fine-tuning and RL to enhance trajectory planning accuracy, achieving state-of-the-art performance with a significant reduction in error rates [25]. - **IRL-VLA**: Proposes a framework that avoids reliance on simulators by using a reward world model, achieving state-of-the-art performance on the NAVSIM v2 benchmark [31]. - **DriveAgent-R1**: Integrates active perception with hybrid thinking, achieving significant improvements in decision reliability and efficiency [32]. - **Drive-R1**: Connects reasoning and planning in VLMs, providing effective methods for integrating reasoning with motion planning [37]. - **ReCogDrive**: Merges cognitive reasoning with diffusion planners, achieving state-of-the-art performance while addressing the limitations of imitation learning [38].
下周开课!我们设计了一份自动驾驶世界模型学习路线图....
自动驾驶之心· 2025-12-24 09:22
Core Viewpoint - The article discusses the distinction between world models and end-to-end models in autonomous driving, emphasizing that world models are a means to achieve end-to-end autonomous driving rather than a specific technology [2]. Summary by Sections Chapter 1: Introduction to World Models - This chapter provides an overview of the relationship between world models and end-to-end autonomous driving, covering the development history and current applications of world models. It introduces various types of world models, including pure simulation, simulation plus planning, and those generating sensor inputs and perception results, along with their industry applications and relevant datasets [5]. Chapter 2: Background Knowledge of World Models - The second chapter focuses on the foundational knowledge necessary for understanding world models, starting with scene representation and expanding to technologies like Transformer and BEV perception. It highlights key technical terms frequently encountered in job interviews related to world models [6][11]. Chapter 3: Discussion on General World Models - This chapter centers on general world models and recent popular works in autonomous driving, including models from Li Fei-Fei's team (Marble), DeepMind (Genie 3), and Meta (JEPA). It also discusses the widely talked-about VLA+ world model algorithms and Tesla's latest world model simulator shared at ICCV [7]. Chapter 4: Video Generation-Based World Models - The fourth chapter focuses on video generation algorithms, which are currently the most researched in both academia and industry. It covers classic works like GAIA-1 & GAIA-2 from Wayve and recent advancements such as UniScene and OpenDWM, providing a comprehensive view of the field's progress [8]. Chapter 5: OCC-Based World Models - This chapter discusses OCC generation algorithms, explaining three major papers and a practical project. These methods can be easily extended for vehicle trajectory planning, contributing to end-to-end solutions [9]. Chapter 6: World Model Job Topics - The final chapter shares practical insights from the instructor's years of experience, addressing the application of world models in the industry, existing pain points, and how to prepare for related job interviews, focusing on what companies prioritize [10]. Course Outcomes - The course aims to advance understanding of end-to-end autonomous driving, equipping participants with knowledge of world model technologies, including video generation and OCC generation methods, and preparing them for roles in the autonomous driving industry [10][13].
自动驾驶之心元旦活动开启(星球六折/课程七五折/论文辅导等)
自动驾驶之心· 2025-12-24 03:29
Group 1 - The company is offering a discount of 7.5% on all self-driving courses, excluding mass production courses [1] - New members joining the knowledge community can enjoy a 60% discount, while existing members can renew at a 50% discount [3] - Starting from the beginning of the promotional event, customers who spend over 4000 on self-driving courses will receive a high-quality self-driving course for free [1] Group 2 - The company provides 1-on-1 job coaching services, currently available at a promotional price [1]
刷完英伟达今年所有的项目后,我们推荐这几个......
自动驾驶之心· 2025-12-24 03:29
Core Insights - NVIDIA has become a focal point in the AI landscape, achieving a market valuation of $5 trillion, an elevenfold increase over three years, marking it as the first company to reach this milestone [2] - The company has transitioned from a graphics chip manufacturer to a leading AI infrastructure provider, with significant advancements in various AI domains, including autonomous driving and embodied intelligence [2] Group 1: Technological Developments - The Cosmos series, initiated in January, has produced foundational models like Cosmos-Transfer1, Cosmos-Reason1, and Cosmos-Predict2.5, which support downstream applications in autonomous driving and embodied intelligence [5] - The Nemotron series aims to create a "digital brain" for the agent-based AI era, providing efficient models and tools for enterprises to build specialized AI systems [5] - The Isaac Lab project offers a GPU-accelerated simulation framework for multi-modal robot learning, addressing challenges in data scarcity and the simulation-to-reality gap [6] Group 2: Key Projects and Papers - The Nemotron Nano V2 VL model, a 12 billion parameter visual language model, achieves state-of-the-art performance in document understanding and long video reasoning tasks while maintaining text reasoning capabilities [12] - The Alpamayo-R1 project introduces a visual-language-action model that integrates causal reasoning and trajectory planning to enhance decision-making in complex driving scenarios [13] - The Cosmos-Predict2.5 model unifies text, image, and video generation capabilities, significantly improving video quality and consistency for physical AI tasks [17] Group 3: Performance Metrics - The Nemotron Nano V2 VL model has shown superior performance across 45 multi-modal benchmark tests, particularly in document understanding and long video question-answering tasks [12] - The Alpamayo-R1 model demonstrated a 12% increase in planning accuracy and a 35% reduction in derailment rates in challenging scenarios compared to baseline models [16] - The Cosmos-Reason1 model has achieved over a 10% performance improvement in physical reasoning tasks after fine-tuning, showcasing its capability in understanding physical laws [33]
某头部智驾公司离职员工被判大额竞业赔偿...
自动驾驶之心· 2025-12-24 03:29
以下文章来源于蚀刻AiTech ,作者蚀刻团队 蚀刻AiTech . 本文只做学术分享,如有侵权,联系删文 据蚀刻AiTech信息报道,某头部智驾公司近日通过内部全员通告,披露了一起针对前员工违反竞业限制义务的 司法追责结果。通告显示,该前员工离职后隐匿身份加入竞对企业。该公司对此启动司法程序并追查到底。法 院已于近日作出生效判决,认定该员工违反竞业限制义务,需向该公司支付巨额赔偿。该公司强调,这一判决 意味着相关违约行为"将通过该判决永久留在其职业记录里"。 通告措辞严厉,这家头部智驾公司明确表态,对任何违反竞业限制的行为"零容忍"。无论员工级别、时间节点 或去向,公司都将穷尽法律手段追责到底。该公司同时提醒全体员工,竞业违约不仅意味着经济层面的巨额赔 偿,也将对个人职业生涯造成长期影响。 从行业视角看,此事件标志着中国智能驾驶领域头部玩家的竞争烈度显著升级。过去几年,行业竞争焦点主要 集中在技术路线、量产落地速度和融资规模上,是典型的商业与技术竞争。而此次该公司通过法律手段成功 对"跳槽"至直接竞争对手的前员工进行追责并获法院支持,意味着头部玩家之间的博弈,正从单一的商业与技 术维度,迅速延伸至人才保卫、 ...
双SOTA!GenieDrive:物理一致的自动驾驶世界模型(港大&华为诺亚)
自动驾驶之心· 2025-12-24 00:58
Core Insights - The article presents GenieDrive, a new framework for autonomous driving that utilizes 4D Occupancy as an intermediate representation, offering a novel research path of "first generating 4D occupancy, then generating video" [2][25]. Summary by Sections Project Overview - GenieDrive is a novel framework for autonomous driving world modeling that achieves highly controllable, multi-view consistent, and physically accurate video generation [7]. - It operates with only 3.47 million parameters while achieving a reasoning speed of 41 FPS and a 7.2% improvement in mIoU for 4D occupancy prediction tasks [5][7]. Research Background and Challenges - Current autonomous driving world models face two main challenges: insufficient physical consistency and high-dimensional representation modeling [8]. - Existing methods rely on single video diffusion models, which complicate learning and can lead to results inconsistent with real physical laws [4][8]. Innovations of GenieDrive - GenieDrive features a two-stage world modeling and generation framework, incorporating 4D Occupancy as an intermediate state to inject explicit physical information into video generation [10][11]. - It employs a Tri-plane VAE for efficient compression, using only 58% of the latent representations of existing methods while achieving state-of-the-art occupancy reconstruction performance [11]. - The framework includes a Mutual Control Attention mechanism to explicitly model the impact of driving control on occupancy evolution, enhancing prediction accuracy through end-to-end joint training [11]. Experimental Results and Analysis - GenieDrive shows significant improvements in 4D occupancy prediction performance, with a 7.2% increase in mIoU compared to the latest methods [13]. - The model achieves a reasoning speed of 41 FPS with a total parameter count of 3.47 million [13]. - In video generation, GenieDrive reduces the FVD metric by 20.7%, outperforming existing occupancy-based methods [15]. Future Outlook - By introducing 4D Occupancy as an intermediate representation, GenieDrive aims to advance closed-loop evaluation and simulation technologies, potentially opening new research directions and applications in the autonomous driving field [23].