自动驾驶之心
Search documents
博世拿下百亿ADAS订单
自动驾驶之心· 2025-12-19 00:05
Core Insights - The article highlights a significant global order in the intelligent driving industry, where Toyota has partnered with Bosch for a billion-level ADAS project, marking a shift from regional competition to a global collaborative framework [5][6][10] - This partnership is seen as a strategic move for Toyota to establish a benchmark for suppliers in the global intelligent driving era, emphasizing the importance of compliance and localization capabilities [6][7][10] Group 1: Partnership Significance - The collaboration between Toyota and Bosch is the largest single project in the global intelligent driving sector, covering key markets such as North America, the EU, the UK, and Japan, and enabling L2 level intelligent driving features [5][6] - This partnership signifies a transition in the automotive industry, where competition is evolving from regional technical showcases to global engineering implementation and ecosystem adaptation [6][9] Group 2: Challenges and Opportunities for Chinese Automakers - Chinese automakers face the challenge of adapting to global markets, where intelligent driving capabilities are no longer just an added advantage but a necessity for market entry [6][10] - The article suggests that understanding local markets and building a robust compliance network are critical for success in the global intelligent driving competition [10] Group 3: Bosch's Competitive Edge - Bosch's extensive global service capabilities and deep penetration in various markets make it an ideal partner for Toyota, as it can provide a comprehensive compliance system for intelligent driving [7][9] - Bosch's experience in both traditional and hybrid vehicle systems positions it uniquely to support Toyota's ambitions in the intelligent driving space, particularly in the context of hybrid vehicles [8][9]
端到端落地中可以参考的七个Project
自动驾驶之心· 2025-12-19 00:05
Core Viewpoint - The article emphasizes the importance of end-to-end production in autonomous driving technology, highlighting the need for practical experience in various algorithms and applications to address real-world challenges in the industry [2][7]. Course Overview - The course is designed to provide in-depth knowledge on end-to-end production techniques, focusing on key algorithms such as one-stage and two-stage frameworks, reinforcement learning, and trajectory optimization [2][4]. - It includes practical projects that cover the entire process from theory to application, ensuring participants gain hands-on experience [2][12]. Instructor Background - The instructor, Wang Lu, is a top-tier algorithm expert with a strong academic background and extensive experience in developing and implementing advanced algorithms for autonomous driving [3]. Course Structure - The course consists of eight chapters, each focusing on different aspects of end-to-end algorithms, including: 1. Overview of end-to-end tasks and integration of perception and control systems [7]. 2. Two-stage end-to-end algorithm frameworks and their advantages [8]. 3. One-stage end-to-end algorithms with a focus on performance [9]. 4. Application of navigation information in autonomous driving [10]. 5. Introduction to reinforcement learning algorithms and training strategies [11]. 6. Optimization of trajectory outputs using various algorithms [12]. 7. Post-processing strategies for ensuring reliable outputs [13]. 8. Sharing of production experiences and strategies for real-world applications [14]. Target Audience - The course is aimed at advanced learners with a foundational understanding of autonomous driving algorithms, including familiarity with reinforcement learning and diffusion models [15][17].
清华UniMM-V2X:基于MOE的多层次融合端到端V2X框架
自动驾驶之心· 2025-12-19 00:05
Core Insights - The article discusses the limitations of traditional modular autonomous driving systems and introduces the UniMM-V2X framework, which enhances multi-agent end-to-end systems through multi-level collaboration in perception and prediction [1][3][25] - UniMM-V2X utilizes a mixture of experts (MoE) architecture to improve the adaptability and specialization of perception, prediction, and planning tasks, achieving state-of-the-art (SOTA) performance [1][7][25] Group 1: UniMM-V2X Framework - UniMM-V2X consists of three main components: an image encoder, a collaborative perception module, and a collaborative prediction and planning module, all integrated with MoE architecture [8][24] - The framework enhances planning by integrating information from multiple agents at both perception and prediction levels, significantly improving decision-making reliability in complex scenarios [6][7][8] Group 2: Performance Metrics - The framework demonstrated a 39.7% improvement in perception accuracy, a 7.2% reduction in prediction error, and a 33.2% enhancement in planning performance, showcasing the effectiveness of the MoE-enhanced multi-level collaboration paradigm [7][25] - In the DAIR-V2X benchmark tests, UniMM-V2X achieved the lowest average planning error of 1.49 meters and a collision rate of only 0.12% over 3 seconds, outperforming all baseline models [15][16][25] Group 3: Comparative Analysis - Compared to the leading single-agent driving solution SparseDrive, UniMM-V2X improved mean Average Precision (mAP) by 39.7% and Average Multi-Object Tracking Accuracy (AMOTA) by 77.2% without incurring additional communication costs [17][25] - In motion prediction, UniMM-V2X achieved a minimum Average Displacement Error (minADE) of 0.64 meters and a minimum Final Displacement Error (minFDE) of 0.69 meters, contributing significantly to overall planning performance [19][20][25] Group 4: Multi-Level Fusion and MoE Impact - The multi-level fusion approach ensures high-quality intermediate features are propagated throughout the framework, leading to performance improvements across all modules [22][23] - The integration of MoE in both the encoder and decoder yields the best results, enhancing environmental understanding and capturing complex motion behaviors effectively [22][23] Group 5: Practicality and Reliability - UniMM-V2X significantly reduced communication costs by 87.9 times compared to traditional methods while maintaining planning quality, achieving a frame rate of 5.4 FPS [24][25] - The framework demonstrates reliability and scalability under various bandwidth conditions, making it suitable for real-world autonomous driving applications [24][25]
特斯拉再一次预判潮水的方向
自动驾驶之心· 2025-12-18 09:35
Core Viewpoint - Tesla's AI leader Ashok Elluswamy revealed the technical methodology behind Tesla's Full Self-Driving (FSD) in a recent article, emphasizing the choice of an end-to-end neural network model and addressing the challenges faced in practice [4][6]. Group 1: End-to-End Neural Network Model - Tesla's decision to adopt an end-to-end neural network model is driven by the need to address complex driving scenarios that cannot be pre-defined by rules, such as the "trolley problem" and second-order effects [6][10]. - The end-to-end model is described as a complete overhaul of previous architectures, fundamentally changing design, coding, and validation processes, leading to a more human-like driving experience [11][19]. - The model outputs driving instructions alongside interpretable "intermediate results," utilizing technologies like generative Gaussian splatting to create dynamic 3D models of the environment in real-time [8][17]. Group 2: VLA and World Model Concepts - VLA (Vision-Language-Action) is an extension of the end-to-end model that incorporates language information, allowing for a more visual representation of driving behavior [12][14]. - The world model aims to establish a high-bandwidth cognitive system based on video/image data, addressing the limitations of language models in understanding complex, dynamic environments [15][19]. - The relationship between end-to-end, VLA, and world models is clarified, with end-to-end serving as the foundation, VLA as an upgrade, and the world model as the ultimate form of understanding spatial dynamics [12][19]. Group 3: Industry Perspectives and Trends - The industry is divided into three main technical routes: end-to-end, VLA, and world model, with companies like Horizon Robotics and Bosch primarily adopting end-to-end due to lower costs and higher stability [13][19]. - VLA has faced criticism from industry leaders who argue that its reliance on language models may not be essential for effective autonomous driving, emphasizing the need for spatial understanding instead [16][19]. - Tesla's recent publication has reignited discussions in the industry, positioning the company at the forefront of current technological directions and providing a systematic analysis of practical applications [20].
开源首次追平GPT-5!DeepSeek-V3.2:推理与效率兼得
自动驾驶之心· 2025-12-18 09:35
Core Insights - The article discusses the advancements of the open-source large language model (LLM) DeepSeek-V3.2, which has made significant strides in performance, particularly in complex reasoning and tool usage, challenging the dominance of closed-source models like those from OpenAI [2][43]. - DeepSeek-V3.2 has achieved competitive results in various authoritative benchmark tests, equaling or surpassing closed-source models in several key areas, including mathematics and coding competitions [2][39][40]. Summary by Sections Current Challenges of Open-Source Models - Open-source models face three main challenges: reliance on standard attention mechanisms leading to inefficiencies in processing long sequences, insufficient computational resources for post-training, and a lack of systematic training for intelligent agent capabilities [6][7]. - The traditional attention mechanism's computational complexity increases quadratically with sequence length, limiting deployment and optimization [7]. - Closed-source models invest heavily in post-training resources, while open-source models often lack the budget for such enhancements, affecting performance in critical tasks [7]. Solutions Proposed by DeepSeek-V3.2 - DeepSeek-V3.2 addresses these challenges through three core innovations: a new attention mechanism (DeepSeek Sparse Attention), increased computational resources for post-training, and a large-scale intelligent agent task synthesis pipeline [8][21]. - The DeepSeek Sparse Attention (DSA) mechanism reduces computational complexity from O(L²) to O(Lk), significantly improving efficiency while maintaining performance [11][20]. Technical Innovations - DSA employs a "lightning indexer" and fine-grained token selection to optimize attention calculations, allowing for faster processing of long sequences without sacrificing accuracy [11][15]. - The model's training consists of two phases: a dense preheating phase to train the indexer and a sparse training phase to adapt the entire model to the new attention mechanism [19][20]. Performance and Benchmarking - DeepSeek-V3.2 has shown strong performance in various benchmarks, achieving scores comparable to leading closed-source models in general reasoning, mathematics, and coding tasks [39][40]. - The model's performance in the AIME 2025 and HMMT competitions indicates its capability in high-stakes environments, with pass rates of 93.1% and 92.5%, respectively [40]. Cost Efficiency and Deployment - The DSA mechanism allows for significant cost reductions in inference, making DeepSeek-V3.2 a viable option for large-scale deployment compared to previous models [41]. - The model's ability to maintain high performance while being cost-effective positions it as a strong alternative to closed-source solutions in real-world applications [41]. Conclusion - The release of DeepSeek-V3.2 marks a significant milestone in the open-source LLM landscape, demonstrating that open-source models can effectively compete with closed-source counterparts through innovative architecture, enhanced computational investment, and robust data engineering [43].
世界模型是一种实现端到端自驾的途径......
自动驾驶之心· 2025-12-18 03:18
Core Viewpoint - The article discusses the distinction between world models and end-to-end models in autonomous driving, clarifying that world models are not end-to-end but serve as a pathway to achieve end-to-end autonomous driving [2][3][4]. Group 1: Definitions and Concepts - End-to-end autonomous driving is defined as a model that processes information input on one end and outputs decision results without explicit information processing and decision logic [3]. - World models are defined as models that accept information input and internally establish a complete understanding of the environment, capable of reconstructing and predicting future changes [4]. Group 2: Course Introduction - A new course on world models has been launched, focusing on general world models, video generation, and OCC generation algorithms, including applications from Tesla and the Li Fei Fei team [5]. - The course aims to enhance understanding of end-to-end autonomous driving and is designed for individuals looking to enter the autonomous driving industry [15]. Group 3: Course Structure - Chapter 1 introduces world models and their relationship with end-to-end autonomous driving, covering historical development and current applications [10]. - Chapter 2 provides foundational knowledge on world models, including scene representation and relevant technologies like Transformer and BEV perception [10][16]. - Chapter 3 discusses general world models and popular algorithms such as Marble and Genie 3, explaining their core technologies and design philosophies [11]. - Chapter 4 focuses on video generation world models, detailing significant works and advancements in this area [12]. - Chapter 5 covers OCC generation models, discussing their applications and potential for trajectory planning [13]. - Chapter 6 shares industry insights and interview preparation tips for roles related to world models [14]. Group 4: Learning Outcomes - The course aims to elevate participants to the level of a world model autonomous driving algorithm engineer within approximately one year, covering key technologies and enabling practical application in projects [18].
纯图像理解的时代该翻篇了!MMDrive:给自动驾驶装上「多模态大脑」
自动驾驶之心· 2025-12-18 03:18
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | Minghui Hou等 编辑 | 自动驾驶之心 "只看图说话"的自动驾驶视觉模型,在真实路况中够用吗?遮挡、恶劣天气、复杂空间关系……这些挑战让传统模型捉襟见肘。今天要介绍的这项研究,正是要为自 动驾驶系统打造一个更懂"场景"、更会"思考"的视觉语言模型——MMDrive。 一、为什么传统方法不够用了? 论文标题 :MMDrive: Interactive Scene Understanding Beyond Vision with Multi-representational Fusion 论文链接 :https://arxiv.org/abs/2512.13177 作者单位 :吉林大学,香港科技大学(广州),佐治亚理工学院,密歇根大学安娜堡分校 1. 缺乏三维感知能力 :二维图像难以表达深度、空间布局等关键信息; 2. 语义融合能力有限 :不同模态之间往往是"硬拼接",未能实现语义对齐; 3. 关键信息提取效率低 :在复杂动态环境中,模型 ...
从具身到自驾,VLA和世界模型的融合趋势已经形成......
自动驾驶之心· 2025-12-18 00:06
Core Insights - The article discusses the convergence of two leading directions in autonomous driving technology: Vision-Language-Action (VLA) and World Model, highlighting their distinct functionalities and potential for integration [1][2]. Summary of VLA - VLA, or Vision-Language-Action, is a multimodal model that integrates visual input, language commands, and action decisions, enabling vehicles to understand and execute driving instructions while providing explanations [4][5]. - The architecture of VLA consists of three layers: input (multimodal perception), middle (unified reasoning and decision-making), and output (vehicle control commands) [5][6]. - VLA aims to create a seamless interaction between human commands and driving actions, enhancing the interpretability and responsiveness of autonomous systems [6][11]. Summary of World Model - World Model is a generative spatiotemporal neural network that compresses high-dimensional sensor data into a compact internal state, allowing for future scenario predictions through internal simulations [8][9]. - Its architecture also follows a three-layer structure: input (multimodal temporal observations), core (state encoding and generative prediction), and output (future state representations) [9][10]. - The primary goal of World Model is to enable vehicles to simulate potential future scenarios, thereby improving decision-making and safety in complex driving environments [10][12]. Comparison of VLA and World Model - VLA focuses on human-vehicle interaction and interpretable end-to-end driving, while World Model emphasizes building a predictive and simulation-based system for future scenario analysis [11]. - The input for VLA includes sensor data and explicit language commands, whereas World Model relies on temporal sensor data and vehicle state assumptions [11]. - VLA outputs direct action control signals, while World Model provides future state representations rather than immediate driving actions [11]. Integration Potential - Both VLA and World Model share a common technical origin, aiming to address the fragmentation of traditional autonomous driving systems and enhance reasoning capabilities [12][16]. - The ultimate goal of both technologies is to equip autonomous systems with human-like cognitive and decision-making abilities [12][16]. - They face similar challenges in addressing corner cases and improving robustness, albeit through different methodologies [14][16]. Future Directions - The article suggests that the future of autonomous driving may lie in the deep integration of VLA and World Model, creating a comprehensive system that combines perception, reasoning, simulation, decision-making, and explanation [16][47]. - Companies like Huawei and XPeng are already exploring these integration paths, indicating a competitive landscape in the development of advanced autonomous driving technologies [47].
端到端VLA的入门进阶和求职,我们配备了完整的学习路线图!
自动驾驶之心· 2025-12-18 00:06
Core Viewpoint - The article emphasizes the growing demand for technical talent in the autonomous driving sector, particularly in end-to-end and VLA (Vision-Language-Action) technologies, with companies willing to invest significantly in experienced professionals, starting salaries reaching millions annually [2]. Course Offerings - The article outlines several specialized courses aimed at enhancing skills in autonomous driving, including "End-to-End Practical Class for Mass Production," "End-to-End and VLA Autonomous Driving Class," and "VLA and Large Model Practical Course," catering to various levels from beginners to advanced professionals [4][7][12]. End-to-End Mass Production Course - This course focuses on the practical implementation of end-to-end autonomous driving, covering key modules such as navigation information application, reinforcement learning optimization, diffusion and autoregressive production experience, and spatiotemporal joint planning [4]. End-to-End and VLA Autonomous Driving Course - This course addresses macro aspects of end-to-end autonomous driving, detailing key algorithms and theoretical foundations, including BEV perception, large language models, diffusion models, and reinforcement learning [7]. VLA and Large Model Practical Course - This course requires participants to have a GPU with recommended computing power of 4090 or higher, a foundational understanding of autonomous driving, and familiarity with concepts like transformer models and reinforcement learning [11]. Instructor Profiles - The courses are led by industry experts with strong academic backgrounds, including those with multiple published papers in top conferences and extensive experience in algorithm development and mass production in autonomous driving [6][9][14][15].
清华&小米DGGT:0.4秒完成4D高斯重建,性能提升50%!
自动驾驶之心· 2025-12-18 00:06
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | Xiaoxue Chen等 编辑 | 自动驾驶之心 清华大学与小米汽车联合推出 DGGT(Driving Gaussian Grounded Transformer):一个pose-free、feed-forward的4D动态驾驶场景重建框架。 DGGT 只需未标定的稀疏图像,单次前向即可同时输出相机位姿、深度、动态实例与基于 3D Gaussian 的可编辑场景表示。模型在 Waymo 上训练,却能在 nuScenes 与 Argoverse2 上实现强劲的零样本泛化——在关键感知指标上相比STORM提升超过 50%。此外,系统通过lifespan head建模场景随时间的外观演变,并配合单步扩散精 修,有效抑制运动插值伪影,提升时空一致性与渲染自然度。 图1.左:从未标定稀疏图像在0.4 s内重建动态场景,并输出相机姿态、深度、动态图、3D Gaussian追踪等可编辑资产;右:在速度与精度上相较前向/优化方法处于更优位置 亮点速览 DGG ...