自动驾驶之心
Search documents
刷新NAVSIM SOTA!端到端自动驾驶新框架Masked Diffusion
自动驾驶之心· 2025-12-26 03:32
来源 | 机器之心 原文链接: 刷新NAVSIM SOTA,复旦引望提出Masked Diffusion端到端自动驾驶新框架 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 随着 VLA(Vision-Language-Action)模型的兴起,端到端自动驾驶正经历从「模块化」向「大一统」的范式转移。然而,将感知、推理与规划压缩进单一模型 后,主流的自回归(Auto-regressive)生成范式逐渐显露出局限性。现有的自回归模型强制遵循「从左到右」的时序生成逻辑,这与人类驾驶员的思维直觉存在本 质差异 —— 经验丰富的驾驶员在处理复杂路况时,往往采用「以终为始」的策略,即先确立长期的驾驶意图(如切入匝道、避让行人、靠边停靠),再反推当前 的短期操控动作。此外,基于模仿学习的模型容易陷入「平均司机」陷阱,倾向于拟合数据分布的均值,导致策略平庸化,难以在激进博弈与保守避让之间灵活切 换。 针对上述痛点, 复旦大学与引望智能联合提出了 WAM-Diff 框架 。该研究创新 ...
端到端下半场,如何做好高保真虚拟数据集的构建与感知?
自动驾驶之心· 2025-12-26 03:32
Core Viewpoint - The article discusses the transformative impact of high-fidelity virtual datasets, specifically SimData, on the development of autonomous driving algorithms, emphasizing the need for high-quality data to overcome the limitations of traditional real-world testing [2][4][29]. Group 1: SimData Dataset Overview - SimData addresses the high demand for quality data in autonomous driving, highlighting the challenges of traditional real-world testing, including high operational costs, subjective bias in manual labeling, and legal constraints [4][5]. - The dataset includes 880 instances, 215,472 keyframe data, and 64,190 annotations, showcasing its extensive scale and diversity [6][7]. - SimData covers critical operational design domains (ODD) such as highways, urban canyons, and parking lots, with a focus on hard-to-capture scenarios like construction zones and extreme lighting conditions [7]. Group 2: Automation Toolchain: aiSim2nuScenes - The aiSim2nuScenes toolchain facilitates the efficient conversion of virtual simulation data into high-value data assets for algorithms, creating a standardized bridge between virtual environments and algorithm applications [11][12]. - It automates the generation of multi-modal sensor data and ensures strict temporal alignment of sensor data, achieving microsecond-level synchronization [13][15]. - The toolchain supports the nuScenes standard format, enhancing compatibility and reducing the engineering team's migration costs [13]. Group 3: Algorithm Empirical Evidence - Training experiments on the pure virtual dataset demonstrated rapid convergence, achieving a mean Average Precision (mAP) of 0.446 and a nuScenes Detection Score (NDS) of 0.428 within 30 epochs [19]. - The consistency between models trained on SimData and those trained on real-world data was validated through AP correlation analysis and attention heatmap analysis, indicating high fidelity in feature extraction [20][22]. - Domain adaptation experiments showed that combining real-world data with virtual data significantly improved model performance across various categories, proving that virtual data complements rather than replaces real data [23][26]. Group 4: Conclusion and Future Outlook - The article concludes that high-fidelity virtual data is essential for training algorithms capable of generalizing to real-world scenarios, emphasizing the importance of accurate modeling of physical processes [29]. - As the demand for high-quality synthetic data grows, the integration of virtual data into the training process is positioned as a key strategy for enhancing the robustness and performance of autonomous driving systems [29].
前馈GS在自驾场景落地的难点是什么?
自动驾驶之心· 2025-12-26 03:32
Core Viewpoint - The article discusses the challenges and advancements in the field of 3D Generative Synthesis (3DGS) for autonomous driving, emphasizing the importance of a structured learning path for newcomers in the industry [2][6]. Group 1: Course Overview - The course titled "3DGS Theory and Algorithm Practical Tutorial" aims to provide a comprehensive learning roadmap for 3DGS, covering both theoretical foundations and practical applications [2][6]. - The course is designed in collaboration with industry algorithm experts and spans over two and a half months, starting from December 1 [13]. Group 2: Course Structure - Chapter 1 introduces the background knowledge of 3DGS, including basic concepts of computer graphics, implicit and explicit representations of 3D space, and common development tools like SuperSplat and COLMAP [6][7]. - Chapter 2 delves into the principles and algorithms of 3DGS, covering dynamic reconstruction, surface reconstruction, and ray tracing, with practical exercises using the NVIDIA open-source 3DGRUT framework [7][8]. - Chapter 3 focuses on the application of 3DGS in autonomous driving simulation, highlighting key works and practical tools like DriveStudio for further learning [8][9]. - Chapter 4 discusses important research directions in 3DGS, including extensions of COLMAP and depth estimation, and their relevance to both industry and academia [9]. - Chapter 5 covers Feed-Forward 3DGS, detailing its development history and algorithmic principles, along with discussions on recent algorithms like AnySplat and WorldSplat [10]. Group 3: Interaction and Support - Chapter 6 is dedicated to online discussions and Q&A sessions, allowing participants to engage with instructors on industry pain points and job market demands [11]. - The course encourages continuous interaction between students and professionals from both academia and industry, enhancing networking opportunities [15].
一个在量产中很容易被忽略重要性的元素:导航信息SD
自动驾驶之心· 2025-12-26 01:56
Core Viewpoint - The article discusses the application of navigation information in autonomous driving, emphasizing its importance in providing lane guidance, waypoint information, and reference lines to enhance vehicle path planning and control [2][4][32]. Group 1: Navigation Information Application - Navigation information SD/SD Pro is currently utilized in many production solutions, offering lane and waypoint data to provide a comprehensive view for drivers [2]. - The core responsibilities of the navigation module include providing reference lines, which significantly reduce planning pressure by offering a predefined driving path [4]. - Additional functionalities include providing planning constraints and priorities, as well as path monitoring and replanning [5]. Group 2: Path Planning and Behavior Guidance - Global path planning at the lane level involves searching for the optimal lane sequence to reach a target lane [6]. - The navigation information aids behavior planning by providing clear semantic guidance, allowing vehicles to prepare for lane changes, deceleration, and yielding in advance [6]. Group 3: Course Overview - The article outlines a course focused on practical applications in autonomous driving, covering topics such as end-to-end algorithms, navigation applications, and trajectory optimization [24][29]. - The course is designed for advanced learners and aims to provide insights into integrating perception tasks and designing learning-based control algorithms [29][37]. - It includes practical sessions on various algorithm frameworks, including one-stage and two-stage models, and emphasizes the importance of navigation information in production applications [30][31][32].
一见Auto采访小米陈光的一些信息分享......
自动驾驶之心· 2025-12-26 01:56
Core Viewpoint - The article discusses the competitive landscape of autonomous driving technology, highlighting the different methodologies and ambitions of various companies, particularly focusing on Xiaomi's approach to end-to-end algorithms and the integration of world models and reinforcement learning [4][5][6]. Group 1: Xiaomi's Strategy and Development - Xiaomi's autonomous driving team is focusing on end-to-end development, having established a dedicated department for algorithm and function development in 2024, which is relatively late compared to competitors like Li Auto and NIO [5][6]. - The company has rapidly advanced its technology, pushing out 3 million Clips of end-to-end (HAD) in February 2025 and 10 million Clips in July 2025, with the enhanced version of Xiaomi HAD officially launched at the Guangzhou Auto Show in November 2025 [5][15]. - The enhanced version incorporates a world model and reinforcement learning, allowing the model to simulate experienced drivers and understand the reasoning behind driving actions, thus enhancing its cognitive capabilities [5][6][19]. Group 2: Technical Approaches and Challenges - Xiaomi's approach emphasizes maximizing the "intelligence density" of models, regardless of whether they use VA, WA, or VLA methodologies, indicating a focus on cognitive-driven solutions rather than purely data-driven ones [5][18]. - The integration of world models and reinforcement learning presents challenges, such as ensuring the fidelity of the world model and managing computational efficiency during parallel exploration [6][59]. - Xiaomi's autonomous driving team is structured into three groups, exploring various methodologies, including VLA, WA, and VA, while maintaining a focus on end-to-end solutions [10][30]. Group 3: Industry Context and Competition - The autonomous driving industry is experiencing a "nomenclature overload," with various factions emerging around different technical approaches, leading to ongoing debates about the best methodologies [7][26]. - Xiaomi's rapid growth in its autonomous driving team, which has expanded to over 1,800 members in four years, contrasts with competitors who took longer to build their teams [13][46]. - The company has invested 23.5 billion yuan in R&D by the third quarter of 2025, with a quarter of that allocated to AI development, showcasing its commitment to advancing its autonomous driving capabilities [13][46]. Group 4: User Experience and Market Perception - Xiaomi emphasizes that the ultimate measure of technology is user experience, arguing that advanced technology does not guarantee better user perception or trust [12][24]. - The company acknowledges the pressures and criticisms it faces as a latecomer in the autonomous driving space, asserting the importance of resilience and long-term thinking in overcoming challenges [15][48]. - Xiaomi's strategy includes leveraging its existing infrastructure and data resources from other business units to enhance its autonomous driving capabilities, allowing for rapid development and deployment [44][46].
年末L4的商业化落地被九识悄悄打响了......
自动驾驶之心· 2025-12-25 09:33
Core Viewpoint - The article discusses the recent strategic partnership between Jiushi Intelligent and Dongfeng, highlighting the growing maturity of L4 autonomous driving technology and its commercialization potential in various vehicle segments [3][4][20]. Financing Situation - The article mentions that several L4 companies are currently undergoing financing, with Jiushi and Xinshi performing well in the low-speed logistics sector [2][3]. Technological Maturity - The partnership emphasizes the importance of engineering maturity for L4 systems, which must perform reliably in complex and extreme environments [7][8]. - Jiushi Intelligent has developed a unique OCC temporal model that enhances perception capabilities by utilizing multi-frame temporal data to understand object movement and potential risks [9]. Cost Structure - Jiushi's focus on cost reduction from the outset has allowed it to create a viable business model for L4 technology, addressing early concerns about the commercial feasibility of autonomous driving [15][18]. - The company employs a highly self-developed architecture for hardware, which optimizes sensor selection and computational power, leading to better cost-effectiveness [17]. Ecosystem Development - Jiushi does not position itself as a single vehicle supplier but aims to build an expandable product and ecosystem around L4 capabilities, collaborating with various partners to create diverse autonomous vehicle forms [19]. - The company has successfully deployed nearly 16,000 autonomous vehicles across over 300 cities, accumulating more than 70 million kilometers of safe operational mileage [19]. Strategic Collaboration - The collaboration between Dongfeng and Jiushi represents a "capability-level cooperation," where Dongfeng provides manufacturing advantages while Jiushi offers mature L4 technology and commercialization experience [20]. - As the industry shifts towards profitability, Dongfeng's choice of Jiushi reflects a recognition of its technological approach and cost control capabilities [21].
华科&港大提出UniLION:基于线性组 RNN 的统一自动驾驶模型
自动驾驶之心· 2025-12-25 09:33
Core Viewpoint - UniLION is a groundbreaking unified autonomous driving framework developed by the University of Hong Kong, Huazhong University of Science and Technology, and Baidu, which effectively addresses computational efficiency issues in processing large-scale point cloud data and multi-view images using linear group RNN technology [2][3]. Group 1: Project Overview - UniLION is designed to efficiently handle large-scale LiDAR point clouds, high-resolution multi-view images, and temporal data without the need for explicit temporal or multi-modal fusion modules, supporting various configurations seamlessly [4][5]. - The framework aims to simplify the design of multi-modal and multi-task autonomous driving systems while maintaining superior performance across core tasks such as 3D perception, prediction, and planning [3][44]. Group 2: Research Background and Challenges - Current autonomous driving systems face challenges in computational efficiency, multi-modal fusion complexity, temporal information processing, and multi-task learning difficulties [5]. - Traditional Transformer models introduce significant computational overhead due to their quadratic complexity in attention mechanisms when processing long sequences [5]. Group 3: Innovations of UniLION - UniLION features a unified 3D backbone network based on linear group RNN, allowing seamless processing of different modalities and temporal information without explicit fusion modules [8]. - The framework utilizes linear computational complexity to convert multi-view images, LiDAR point clouds, and temporal information into tokens for unified integration in 3D space [8]. - UniLION generates a compact unified bird's-eye view (BEV) representation of heterogeneous multi-modal information and time series, serving as shared features for various downstream tasks [8]. Group 4: Performance Results - UniLION demonstrated competitive and state-of-the-art performance on the nuScenes dataset, achieving 74.9% NDS and 72.2% mAP in 3D object detection, 76.2% AMOTA in multi-object tracking, and 72.3% mIoU in BEV map segmentation [20]. - The strongest temporal multi-modal version of UniLION achieved 75.4% NDS and 73.2% mAP in detection tasks, showcasing its advanced capabilities across multiple evaluation tasks [20]. Group 5: Efficiency and Robustness - UniLION significantly reduces computational resource requirements and inference time through its linear computational complexity, making it suitable for deployment in real-world autonomous driving systems [35]. - The framework exhibits strong robustness against sensor misalignment, maintaining performance even under high misalignment levels [32]. Group 6: Future Prospects - Future work includes expanding UniLION to support additional sensor modalities, applying it in real-world autonomous driving systems, and exploring large-scale pre-training to enhance its generalization capabilities [45].
Physical Intelligence内部员工分享(从数采到VLA再到RL)
自动驾驶之心· 2025-12-25 09:33
Core Insights - The article discusses the current state of robot learning as of December 2025, emphasizing that most systems rely on behavior cloning (BC) and the challenges associated with it [8][41]. - It highlights the importance of human demonstrations in training robot learning systems and the need for innovative solutions to improve robustness and efficiency [74]. Group 1: Behavior Cloning and Its Challenges - Behavior cloning systems require high-quality data from human demonstrations, which are often slow to collect and expensive to scale [12][22]. - The primary issues with behavior cloning include the inability to generalize beyond the training data, leading to performance degradation in out-of-distribution (OOD) states [20][26]. - The article outlines the necessity of developing models that can recover from failure states and adapt to new situations, suggesting a DAgger-style approach to training [30][36]. Group 2: Future Directions in Robot Learning - The article predicts that human demonstrations will remain crucial for the foreseeable future, with a call for the development of integrated hardware and software systems to streamline the training process [74]. - It anticipates that within two years, video model backbones will replace current VLA systems, and within ten years, world models will effectively simulate general open-world interaction strategies [75]. - The need for real robot rollouts is emphasized as essential for achieving superhuman performance, indicating that traditional simulation methods may not suffice [75]. Group 3: Industry Implications - The article suggests that companies focusing on creating effective human demonstration systems will become attractive partners or acquisition targets in the robotics industry [74]. - It warns that data labeling and pre-training data sales are highly commoditized and require operational excellence to succeed [75]. - The importance of internal evaluation processes is highlighted, as they are critical for model improvement and cannot be outsourced [75].
某头部智驾公司离职员工被判大额竞业赔偿......
自动驾驶之心· 2025-12-25 06:42
以下文章来源于蚀刻AiTech ,作者蚀刻团队 蚀刻AiTech . 智能驾驶十年老兵,走过四家公司,搞过芯片做过量产,写写行业新鲜事。 期待刻录AI发展的重点时刻。 从行业视角看,此事件标志着中国智能驾驶领域头部玩家的竞争烈度显著升级。过去几年,行业竞争焦点主要集中在技术路 线、量产落地速度和融资规模上,是典型的商业与技术竞争。而此次该公司通过法律手段成功对"跳槽"至直接竞争对手的前 员工进行追责并获法院支持,意味着头部玩家之间的博弈,正从单一的商业与技术维度,迅速延伸至人才保卫、商业秘密保 护及法律合规等全方位、立体化的竞争层面。 更多一手业内信息,欢迎加入自动驾驶之心知识星球... 自动驾驶之心 来源 | 蚀刻AiTech 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 据蚀刻AiTech信息报道,某头部智驾公司近日通过内部全员通告,披露了一起针对前员工违反竞业限制义务的司法追责结 果。通告显示,该前员工离职后隐匿身份加入竞对企业。该公司对此启动司法程序并追查到 ...
理想MindGPT-4o-Vision技术报告压缩版
自动驾驶之心· 2025-12-25 03:24
Core Insights - The article discusses the release of the MindGPT-4ov technology report by Li Auto, highlighting the trade-offs between general capabilities and vertical domain adaptation in multi-modal large models [1] Group 1: Challenges in Multi-Modal Model Training - Three key inefficiencies and biases in current multi-modal model training are identified: 1. Resource allocation is inefficient, treating all data equally and neglecting high-value data, leading to wasted computational resources [2] 2. A reward mechanism that causes diversity collapse, where models converge to a few safe response patterns, sacrificing output diversity and generalization ability [2] 3. Unimodal spurious correlations, where models overly rely on prior knowledge from language models rather than visual evidence, leading to factual errors in industrial applications [2] Group 2: MindGPT-4ov Training Paradigm - The MindGPT-4ov post-training paradigm consists of four core modules: 1. Data construction based on Information Density Score (IDS) and a dual-label system [3] 2. Supervised fine-tuning (SFT) through collaborative curriculum SFT [3] 3. Reinforcement learning (RL) with a hybrid reward mechanism [3] 4. Infrastructure improvements for parallel training and inference optimization [3] Group 3: Information Density Score (IDS) and Data Synthesis - IDS evaluates image data across four dimensions: subject diversity, spatial relationships, OCR text richness, and world knowledge relevance [3] - A dynamic synthesis strategy adjusts the number of generated question-answer pairs based on IDS scores, optimizing resource allocation [3] Group 4: Supervised Fine-Tuning (SFT) Mechanism - The SFT mechanism employs a three-stage collaborative curriculum learning approach to address the conflict between knowledge injection and capability retention: 1. Cross-domain knowledge learning focuses on injecting vertical domain knowledge [5] 2. Capability restoration uses general datasets to recover potential declines in general capabilities [5] 3. Preference alignment optimizes response formats and reduces hallucinations using high-quality preference data [5] Group 5: Reinforcement Learning with Hybrid Rewards - The RL phase introduces multiple reward signals to balance accuracy, diversity, and conciseness: 1. Pass@k rewards encourage exploration of different reasoning paths by rewarding any correct answer among the top k responses [6] 2. Diversity rewards penalize semantically similar responses, promoting varied outputs [6] 3. Length rewards impose penalties for overly long responses, ensuring concise outputs [6] Group 6: Label Construction and Data Admission - A hierarchical labeling system is established, with experts defining primary labels and MLLM generating secondary and tertiary labels to form a comprehensive knowledge tree [7] - Data synthesis involves matching images with coarse and fine-grained topics, generating QA pairs based on IDS scores, and filtering low-quality data through a multi-model voting mechanism [7] Group 7: Performance Metrics - MindGPT-4ov demonstrates significantly shorter average response lengths compared to competing models while maintaining higher accuracy (83.3% vs 80.1%), validating the effectiveness of the length reward mechanism [8]