Workflow
端到端自动驾驶
icon
Search documents
深扒特斯拉ICCV的分享,我们找到了几个业内可能的解决方案......
自动驾驶之心· 2025-12-23 00:53
Core Insights - The article discusses Tesla's end-to-end autonomous driving solution, highlighting the challenges and innovative solutions developed to address them [3] Group 1: Challenges and Solutions - Challenge 1: Curse of dimensionality, requiring breakthroughs in both input and output layers to enhance computational efficiency and decision accuracy [4] - Solution: UniLION, a unified autonomous driving framework based on linear group RNN, efficiently processes multi-modal data and eliminates the need for intermediate perception and prediction results [4][7] - UniLION's key features include a unified 3D backbone network and the ability to handle various tasks simultaneously, achieving significant performance metrics such as 75.4% NDS and 73.2% mAP in detection tasks [11] Group 2: Interpretability and Safety - Challenge 2: The need for interpretability and safety guarantees in autonomous driving systems, which traditional models struggle to provide [12] - Solution: DrivePI, a unified spatial-aware 4D multi-modal large language model (MLLM) framework that integrates visual and language inputs to enhance system interpretability and safety [13][14] - DrivePI demonstrates superior performance in 3D occupancy prediction and trajectory planning, significantly reducing collision rates compared to existing models [13][17] Group 3: Evaluation - Challenge 3: The complexity of evaluating autonomous driving systems due to the unpredictability of human driving behavior and diverse interaction scenarios [18] - Solution: GenieDrive, a world model framework that uses 4D occupancy representation to generate physically consistent multi-view video sequences, enhancing the evaluation environment for autonomous systems [21][22] - GenieDrive achieves a 7.2% improvement in mIoU for 4D occupancy prediction and reduces FVD metrics by 20.7%, establishing new performance benchmarks [21][27] Group 4: Integrated Ecosystem - The three innovations—UniLION, DrivePI, and GenieDrive—form a synergistic ecosystem that enhances perception, decision-making, and evaluation in autonomous driving [30][31] - This integrated approach addresses key challenges in the industry, paving the way for safer, more reliable, and efficient autonomous driving systems, ultimately accelerating the transition to L4/L5 level autonomy [31]
世界模型工作正在呈现爆发式增长
自动驾驶之心· 2025-12-20 02:16
Core Viewpoint - The article discusses the distinction between world models and end-to-end models in autonomous driving, emphasizing that world models are a means to achieve end-to-end autonomous driving rather than a specific technology [2]. Group 1: World Model Overview - The article highlights the recent surge in publications related to world models, particularly in the context of closed-loop simulation, which is becoming a trend in the industry due to the high costs associated with corner cases [2]. - It introduces a new course focused on world models, covering various algorithms such as general world models, video generation, and OCC generation, with applications in Tesla's world model and the Marble project by Fei-Fei Li's team [2][5]. Group 2: Course Structure - The course consists of six chapters, starting with an introduction to world models and their relationship with end-to-end autonomous driving, followed by a discussion on the historical development and current applications of world models [5][6]. - The second chapter covers foundational knowledge related to world models, including scene representation and technologies like Transformer and BEV perception, which are crucial for understanding subsequent chapters [5][6]. Group 3: Advanced Topics - The third chapter focuses on general world models, discussing notable models such as Marble, Genie 3 from DeepMind, and the latest developments from Meta, including the VLA+ world model algorithm [6][7]. - The fourth chapter delves into video generation-based world models, presenting classic works and recent advancements in the field, including projects like GAIA-1 & GAIA-2 and OpenDWM [7][8]. - The fifth chapter addresses OCC generation methods, explaining their potential for trajectory planning and end-to-end implementation [8]. Group 4: Industry Application and Career Preparation - The sixth chapter provides insights into the practical applications of world models in the industry, discussing pain points and how to prepare for job interviews in this field [9]. - The course aims to equip participants with the skills to understand and implement world model technologies, preparing them for roles as world model algorithm engineers [10][13].
某新势力智驾负责人遭排挤离职......
自动驾驶之心· 2025-12-19 09:25
Group 1 - The core issue for autonomous driving company A is internal management problems leading to its operational halt, rather than just technical shortcomings [4][5] - Company A's decline was evident as early as April last year, triggered by a whistleblower report regarding a high-salaried employee's resume fraud, which uncovered significant financial issues [4] - Following the loss of trust from the parent group B, all operational permissions of company A were revoked, leading to its eventual takeover by group B [4] Group 2 - New energy vehicle company C's supply chain head was dismissed due to failure to stockpile memory chips during a price surge, which angered the CEO [6] - This incident was not isolated, as company C had previously faced similar issues with core component shortages, indicating a pattern of mismanagement [6] Group 3 - The departure of the autonomous driving head from new energy vehicle company D was triggered by plans to eliminate the existing mapping team, which led to internal conflicts and ultimately his resignation [7] - Despite D's significant investment in high-end technology and a large team, the challenges of developing a mapped route have hindered progress, leaving the company under pressure to deliver results [7] Group 4 - Logistics company E is investing 150 million in developing an L4 autonomous driving demo, but its internal team structure is causing inefficiencies due to merging teams with fundamentally different architectures [8] - The success of this demo is critical for attracting investment, but failure could lead to significant layoffs within the company [8] Group 5 - Autonomous vehicle company F's plan to split and seek independent financing failed due to difficulties in securing investment, despite its low valuation of approximately 1 billion RMB [9] - The company previously operated a fleet of nearly 1,000 autonomous vehicles but has since faced significant team instability and leadership changes, leading to a decline in operational effectiveness [9] Group 6 - A well-known automotive manufacturer G has adopted a "performance theater" culture, leading to ineffective innovation practices and minimal output from its large engineering team [10] - The company has only managed to successfully run two demo routes, despite having a substantial number of engineers, indicating a disconnect between innovation goals and actual productivity [10] Group 7 - Company H's management style is characterized by a lack of accountability and engagement among executives, leading to a disorganized and ineffective workforce [12] - The company has seen a decline in morale and productivity, with many core talents feeling undervalued and overworked [12] Group 8 - In a new energy vehicle company I, internal conflicts led by executive A have resulted in significant inefficiencies and a failure to adapt to industry trends, particularly in autonomous driving technology [13][14] - The CEO's decision to ignore advice to follow industry leaders in end-to-end technology has caused the company to fall behind competitors [14] Group 9 - Autonomous trucking company J is facing financial losses due to its L2 driving assistance model, which has not effectively reduced operational costs and has led to inventory issues [15] - The company's strategy of incentivizing usage through subsidies initially worked but has since resulted in customer dissatisfaction and vehicle returns due to supply chain issues [15]
世界模型是一种实现端到端自驾的途径......
自动驾驶之心· 2025-12-18 03:18
Core Viewpoint - The article discusses the distinction between world models and end-to-end models in autonomous driving, clarifying that world models are not end-to-end but serve as a pathway to achieve end-to-end autonomous driving [2][3][4]. Group 1: Definitions and Concepts - End-to-end autonomous driving is defined as a model that processes information input on one end and outputs decision results without explicit information processing and decision logic [3]. - World models are defined as models that accept information input and internally establish a complete understanding of the environment, capable of reconstructing and predicting future changes [4]. Group 2: Course Introduction - A new course on world models has been launched, focusing on general world models, video generation, and OCC generation algorithms, including applications from Tesla and the Li Fei Fei team [5]. - The course aims to enhance understanding of end-to-end autonomous driving and is designed for individuals looking to enter the autonomous driving industry [15]. Group 3: Course Structure - Chapter 1 introduces world models and their relationship with end-to-end autonomous driving, covering historical development and current applications [10]. - Chapter 2 provides foundational knowledge on world models, including scene representation and relevant technologies like Transformer and BEV perception [10][16]. - Chapter 3 discusses general world models and popular algorithms such as Marble and Genie 3, explaining their core technologies and design philosophies [11]. - Chapter 4 focuses on video generation world models, detailing significant works and advancements in this area [12]. - Chapter 5 covers OCC generation models, discussing their applications and potential for trajectory planning [13]. - Chapter 6 shares industry insights and interview preparation tips for roles related to world models [14]. Group 4: Learning Outcomes - The course aims to elevate participants to the level of a world model autonomous driving algorithm engineer within approximately one year, covering key technologies and enabling practical application in projects [18].
端到端VLA的入门进阶和求职,我们配备了完整的学习路线图!
自动驾驶之心· 2025-12-18 00:06
Core Viewpoint - The article emphasizes the growing demand for technical talent in the autonomous driving sector, particularly in end-to-end and VLA (Vision-Language-Action) technologies, with companies willing to invest significantly in experienced professionals, starting salaries reaching millions annually [2]. Course Offerings - The article outlines several specialized courses aimed at enhancing skills in autonomous driving, including "End-to-End Practical Class for Mass Production," "End-to-End and VLA Autonomous Driving Class," and "VLA and Large Model Practical Course," catering to various levels from beginners to advanced professionals [4][7][12]. End-to-End Mass Production Course - This course focuses on the practical implementation of end-to-end autonomous driving, covering key modules such as navigation information application, reinforcement learning optimization, diffusion and autoregressive production experience, and spatiotemporal joint planning [4]. End-to-End and VLA Autonomous Driving Course - This course addresses macro aspects of end-to-end autonomous driving, detailing key algorithms and theoretical foundations, including BEV perception, large language models, diffusion models, and reinforcement learning [7]. VLA and Large Model Practical Course - This course requires participants to have a GPU with recommended computing power of 4090 or higher, a foundational understanding of autonomous driving, and familiarity with concepts like transformer models and reinforcement learning [11]. Instructor Profiles - The courses are led by industry experts with strong academic backgrounds, including those with multiple published papers in top conferences and extensive experience in algorithm development and mass production in autonomous driving [6][9][14][15].
北交&地平线提出DIVER:扩散+强化的多模态规划新框架
自动驾驶之心· 2025-12-17 03:18
Core Viewpoint - The article discusses the advancement of end-to-end autonomous driving systems, highlighting the introduction of the DIVER framework, which combines diffusion models and reinforcement learning to enhance trajectory diversity and safety in complex driving scenarios [3][33]. Group 1: Current Challenges in Autonomous Driving - Current end-to-end autonomous driving methods primarily rely on imitation learning from a single expert demonstration, leading to a lack of behavioral diversity and overly conservative planning in complex traffic situations [5][6]. - The existing models tend to converge around a single ground truth trajectory, resulting in limited exploration of diverse and safe decision-making options [7][8]. Group 2: Introduction of DIVER Framework - The DIVER framework integrates the multimodal generation capabilities of diffusion models with the goal-oriented constraints of reinforcement learning, transforming trajectory generation into a strategy generation problem under safety and diversity constraints [9][33]. - DIVER aims to produce multiple feasible and semantically valid candidate trajectories, addressing the limitations of traditional imitation learning approaches [9][33]. Group 3: Technical Innovations of DIVER - DIVER employs a Policy-Aware Diffusion Generator (PADG) that incorporates contextual information such as maps and dynamic agents, ensuring that generated trajectories are both semantically clear and feasible [16][20]. - The framework utilizes multiple reference ground truths to align each predicted trajectory with a specific driving intention, thereby preventing mode collapse and enhancing diversity [20][21]. Group 4: Performance Metrics and Results - In various benchmark evaluations, DIVER significantly outperformed existing methods in terms of trajectory diversity and safety, achieving lower collision rates while expanding the range of behaviors covered [28][30]. - The DIVER framework demonstrated superior performance in long-term planning tasks, maintaining the lowest collision rates while achieving higher diversity metrics compared to competitors [32][36]. Group 5: Conclusion and Implications - DIVER represents a significant step towards more human-like decision-making in autonomous driving by addressing the long-standing issues associated with imitation learning [33][34]. - The integration of generative models with reinforcement learning is positioned as a crucial advancement for the future of realistic autonomous driving applications [34].
小鹏最新一篇基于潜在思维链世界模型的FutureX,车端可以借鉴...
自动驾驶之心· 2025-12-15 06:00
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | Hongbin Lin等 编辑 | 自动驾驶之心 港中文联合小鹏最新的一篇工作,很有意思。基于潜在思维链世界模型增强端到端的能力, 有一些值得业内尝试的改进点: 一、背景回顾 端到端(E2E)自动驾驶指的是通过完全可微分的映射,直接将多模态原始传感器数据流转换为运动规划或底层驱动指令的技术流水线。该领域在算法方案和基准测 试两方面均取得了快速发展。尽管面临固有挑战,现有方法仍实现了显著进步。 在这些成功背后,现有端到端自动驾驶系统通过单一神经网络直接将传感器输入映射为控制输出,执行高效的一次性前向预测,而无需进一步"思考"。这导致它们在 复杂环境中缺乏适应性和可解释性(图1第二行)。在人类认知中,驾驶员在执行任何操作前,都会在脑海中模拟可能的未来场景:预测周围车辆的运动趋势、场景的 演变方向,以及每种可能行为的潜在结果(图1第一行)。这种内在推理能力使人类能够做出安全且贴合场景的决策。因此,对于端到端系统而言,在高度动态的交通 环境中推断未来场 ...
南洋理工&哈佛提出OpenREAD:端到端RL统一认知与轨迹规划
自动驾驶之心· 2025-12-13 02:04
Core Viewpoint - The article discusses the introduction of OpenREAD, a new framework developed by Nanyang Technological University and Harvard University, which utilizes reinforcement learning (RL) to enhance the reasoning capabilities of visual language models (VLM) in the context of autonomous driving [4][28]. Group 1: Methodology - OpenREAD incorporates Qwen3-LLM as an "evaluation expert," expanding the application of RL from traditional verifiable downstream tasks to open-ended tasks such as "driving suggestions" and "scene analysis," achieving end-to-end reinforcement fine-tuning from high-level semantic reasoning to low-level trajectory planning [6][28]. - The framework addresses the challenge of designing reward functions for open-ended driving knowledge learning, where multiple expressions can represent the same reference answer, complicating the RL process [7]. - Two preparatory steps were taken: (1) Constructing knowledge data with explicit chains of thought (CoT) using GPT-4 to annotate driving knowledge data covering perception and decision-making tasks [8]; (2) Converting the OmniDrive dataset into a format suitable for RL training, structured as "thinking + answering" [9]. Group 2: Experimental Results - OpenREAD was evaluated on the LingoQA and NuScenes datasets, demonstrating superior performance compared to traditional supervised fine-tuning (SFT) methods in trajectory error, collision rates, and knowledge evaluation metrics [19][20]. - The results indicate that the introduction of driving knowledge significantly enhances the effectiveness of RL fine-tuning, as evidenced by improvements in trajectory error and collision rates [19][20]. - In comparison with existing methods, OpenREAD exhibited better collision control capabilities, ensuring safer driving outcomes [20]. Group 3: Conclusion - OpenREAD successfully implements collaborative reinforcement learning fine-tuning for driving knowledge and trajectory planning, expanding the boundaries of RL applications in end-to-end autonomous driving [28].
时隔一年DiffusionDrive升级到v2,创下了新纪录!
自动驾驶之心· 2025-12-11 03:35
Core Insights - The article discusses the upgrade of DiffusionDrive to version 2, highlighting its advancements in end-to-end autonomous driving trajectory planning through the integration of reinforcement learning to address the challenges of diversity and sustained high quality in trajectory generation [1][3][10]. Background Review - The shift towards end-to-end autonomous driving (E2E-AD) has emerged as traditional tasks like 3D object detection and motion prediction have matured. Early methods faced limitations in modeling, often generating single trajectories without alternatives in complex driving scenarios [5][10]. - Previous diffusion models applied to trajectory generation struggled with mode collapse, leading to a lack of diversity in generated behaviors. DiffusionDrive introduced a Gaussian Mixture Model (GMM) to define prior distributions for initial noise, promoting diverse behavior generation [5][13]. Methodology - DiffusionDriveV2 introduces a novel framework that utilizes reinforcement learning to overcome the limitations of imitation learning, which previously led to a trade-off between diversity and sustained high quality in trajectory generation [10][12]. - The framework incorporates intra-anchor GRPO and inter-anchor truncated GRPO to manage advantage estimation within specific driving intentions, preventing mode collapse by avoiding inappropriate comparisons between different intentions [9][12][28]. - The method employs scale-adaptive multiplicative noise to enhance exploration while maintaining trajectory smoothness, addressing the inherent scale inconsistency between proximal and distal segments of trajectories [24][39]. Experimental Results - Evaluations on the NAVSIM v1 and NAVSIM v2 datasets demonstrated that DiffusionDriveV2 achieved state-of-the-art performance, with a PDMS score of 91.2 on NAVSIM v1 and 85.5 on NAVSIM v2, significantly outperforming previous models [10][33]. - The results indicate that DiffusionDriveV2 effectively balances trajectory diversity and sustained quality, achieving optimal performance in closed-loop evaluations [38][39]. Conclusion - The article concludes that DiffusionDriveV2 successfully addresses the inherent challenges of imitation learning in trajectory generation, achieving an optimal trade-off between planning quality and diversity through innovative reinforcement learning techniques [47].
随到随学!端到端与VLA自动驾驶小班课正式结课
自动驾驶之心· 2025-12-09 19:00
Core Viewpoint - 2023 marks the year of end-to-end production, with 2024 expected to be a significant year for end-to-end production in the automotive industry, as leading new forces and manufacturers have already achieved end-to-end production [1][3]. Group 1: End-to-End Production Development - The automotive industry has two main paradigms: single-stage and two-stage, with UniAD being a representative of the single-stage approach that directly models vehicle trajectories from sensor inputs [1]. - Since last year, the single-stage end-to-end development has rapidly advanced, leading to various derivatives such as perception-based, world model-based, diffusion model-based, and VLA-based single-stage methods [3][5]. - Major players in the autonomous driving sector, including both solution providers and car manufacturers, are focusing on self-research and production of end-to-end autonomous driving technologies [3]. Group 2: Course Overview - A course titled "End-to-End and VLA Autonomous Driving" has been launched, aimed at teaching cutting-edge algorithms in both single-stage and two-stage end-to-end approaches, with a focus on the latest developments in the industry and academia [5][14]. - The course is structured into several chapters, starting with an introduction to end-to-end algorithms, followed by background knowledge on various technologies such as VLA, diffusion models, and reinforcement learning [8][9]. - The second chapter is highlighted as containing the most frequently asked technical keywords for job interviews in the next two years [9]. Group 3: Technical Focus Areas - The course covers various subfields of single-stage end-to-end methods, including perception-based (UniAD), world model-based, diffusion model-based, and the currently popular VLA-based approaches [10][12]. - The curriculum includes practical assignments, such as RLHF fine-tuning, and aims to provide students with hands-on experience in building and experimenting with pre-trained and reinforcement learning modules [11][12]. - The course emphasizes the importance of understanding BEV perception, multi-modal large models, and the latest advancements in diffusion models, which are crucial for the future of autonomous driving [12][16].