Workflow
自动驾驶之心
icon
Search documents
智驾将往何处去?第一次自动驾驶圆桌纪实
自动驾驶之心· 2025-11-06 00:04
Core Insights - The article discusses the evolution and current state of the autonomous driving industry, highlighting the experiences and lessons learned from industry experts [4][7][11] - It emphasizes the importance of strategic execution and the need for companies to avoid weaknesses in their operations to succeed in the competitive landscape of autonomous driving [7][11] Group 1: Industry Evolution - The autonomous driving industry has undergone significant changes over the past decade, with early optimism giving way to more realistic approaches focused on Level 2 (L2) automation and safety [5][6] - Experts reflect on the initial hype surrounding RoboTaxi and the subsequent shift towards practical applications and L2 production, marking a more commercially viable direction for the industry [6][7] Group 2: Key Challenges and Lessons - The industry has faced three major challenges: the abandonment of RoboTaxi, ensuring the safety of L2 systems, and transitioning to mass production [7] - Successful companies in the autonomous driving sector must possess strong strategic execution and avoid operational weaknesses, as the delivery chain for autonomous products is complex and lengthy [7][11] Group 3: Technological Perspectives - The discussion includes insights on VLA (Vision-Language-Action) and world models, highlighting their complementary nature in addressing challenges in autonomous driving [8][10] - Experts agree that advancements in AI and the integration of new technologies will continue to shape the future of autonomous driving, with a focus on balancing innovation and safety [10][11] Group 4: Future Opportunities - There is a consensus among experts that the autonomous driving industry still has significant growth potential, particularly in areas like urban navigation and the integration of academic research into practical applications [11] - The ongoing development of AI coding is seen as a tool that can enhance focus on core algorithmic challenges rather than detracting from the industry's competitive edge [11]
小鹏刚刚发布了VLA 2.0,但去掉了语言转译......
自动驾驶之心· 2025-11-06 00:04
Core Viewpoint - Xiaopeng Motors has recently released VLA 2.0, which represents a significant advancement in autonomous driving technology, particularly in the context of competing with Tesla's innovations [2][10]. Summary by Sections VLA Development - Xiaopeng's VLA is being developed in two parallel paths: V/L→A and V→L→A, with the former aligning more closely with Tesla's recent ICCV sharing, where L is not a middleware but a parallel input to V [3][6]. - The V/L→A model eliminates language translation while maintaining a focus on visual inputs [6]. Technical Specifications - The first mass-produced physical world model boasts a maximum effective computing power of 2250 TOPS [6]. - Future plans include entering the robotaxi market, utilizing four Turing AI chips with a total computing power of 3000 TOPS [8]. Industry Context - The competition in L3 technology is intensifying, with various companies analyzing and following Xiaopeng's VLA developments [10]. - The ongoing debate between world models and VLA pathways remains unresolved, indicating a need for continued exploration in both academic and industrial sectors [10]. Community and Learning Resources - The "Autonomous Driving Heart Knowledge Planet" community has been established to provide a comprehensive platform for learning and sharing knowledge in the autonomous driving field, with over 4000 members and plans to expand to nearly 10,000 [14][31]. - The community offers a variety of resources, including video tutorials, technical discussions, and job placement mechanisms, aimed at both beginners and advanced practitioners in the field [17][29][95].
Kimi Linear一作张宇:关于模型训练的一些感想
自动驾驶之心· 2025-11-06 00:04
Core Insights - The article discusses the development and features of the Kimi Linear model, emphasizing its innovative architecture and training process [4][5][10]. Model Architecture - Kimi Linear adopts a hybrid model approach, combining Linear Attention with a ratio of KDA:MLA set at 3:1, which was found to be optimal for balancing efficiency and performance [5]. - The model's architecture builds upon the design principles of Moonlight, with a significant increase in the sparsity of MoE from 8 to 32 [4]. Training Process - The model was trained on 5.7 trillion tokens, marking a significant scale-up from previous models, with a focus on overcoming challenges in distributed training [10][12]. - The training process involved rigorous monitoring and adjustments, including switching key parameters from bf16 to fp32 to ensure stability and performance [12][13]. Performance and Benchmarking - Despite being a smaller model, Kimi Linear demonstrated substantial improvements in benchmark comparisons, often outperforming larger models in specific tasks [7][14]. - The model's decoding efficiency was enhanced, achieving a speedup of approximately 6 times due to the reduced KV Cache usage from KDA [8]. Future Directions - The article indicates that Kimi aims to establish itself as a flagship model, with ongoing efforts to refine its architecture and performance metrics [17][19]. - The focus on hybrid models and efficient attention mechanisms is highlighted as a key area for future research and development within the industry [19].
寻找散落在世界各地的自动驾驶热爱者(产品/4D标注/世界模型等)
自动驾驶之心· 2025-11-06 00:04
Group 1 - The article emphasizes the increasing demand for corporate training and job counseling in the autonomous driving sector, highlighting the need for diverse training programs ranging from technology updates to industry development summaries [2][4]. - There is a notable interest from individuals seeking guidance, particularly those struggling with resume enhancement and project experience [3]. - The company is inviting professionals in the autonomous driving field to collaborate on various initiatives, including technical services, training, course development, and research guidance [4][5]. Group 2 - The primary focus areas for collaboration include roles such as autonomous driving product managers, 4D annotation/data closure, world models, VLA, large models for autonomous driving, reinforcement learning, and end-to-end solutions [5]. - The job description indicates that the training collaboration targets both B-end (enterprises, universities, research institutes) and C-end (students, job seekers) audiences [6]. - Interested parties are encouraged to reach out for further consultation via WeChat [7].
AI Day直播 | “像素级完美”深度感知,NeurIPS高分论文解密
自动驾驶之心· 2025-11-05 00:04
点击按钮预约直播 深度估计是机器人感知、三维重建、AR/VR 等应用的核心。然而,现有的深度估计方法普遍存在边缘飞点(Flying Pixels)问题,而这会导致机器人执行决策时候,引发错误动作;三维重建时导致物体轮廓鬼影重重等。现有方法经历边 缘飞点主要因为以下原因: 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>直播和内容获取转到 → 自动驾驶之心知识星球 本文提出 Pixel-Perfect Depth (PPD),一种 直接在像素空间进行扩散生成的单目深度估计模型 ,从根源上避免了因 VAE 压缩导致的伪影问题。然而,高分辨率像素空间的扩散建模极具挑战:模型需兼顾 全局语义的一致性 与 局部细节的精确 性 ,否则极易出现结构失真或深度跳变。为此,本文设计了语义引导的扩散 Transformer(SP-DiT),在扩散过程中引入 来自视觉基础模型的高层语义特征作为提示,有效增强了模型对全局结构的把握与细节恢复能力。同时,本文提出一种 判别式模型 (如 Depth Anything v2, Depth Pro )由于回归损失的平滑倾向,容易在深度 ...
理想智驾逆袭往事:端到端的百日冲刺
自动驾驶之心· 2025-11-05 00:04
Core Viewpoint - The article discusses the significant advancements made by Li Auto in the field of autonomous driving, particularly the introduction of the "end-to-end + VLM" system, which marks a turning point for the company in achieving industry leadership [5][7][40]. Group 1: Development of Autonomous Driving Technology - In March 2024, Li Auto's CEO expressed dissatisfaction with the company's autonomous driving progress, emphasizing the need for a shift to an end-to-end approach [4][9]. - The launch of the "end-to-end + VLM" system in July 2024 allowed Li Auto to finally experience true leadership in autonomous driving technology after years of following competitors [5][6]. - By October 2024, the trial driving of the new system accounted for 65% of user experiences in stores, indicating strong market enthusiasm [6][7]. Group 2: Market Performance and User Adoption - In 2024, the delivery share of models equipped with the AD Max system (featuring the new technology) reached 75.4% in the 300,000+ RMB segment and 84.6% in the 400,000+ RMB segment, a significant increase from just 20% earlier in the year [7][51]. - The rapid adoption of the end-to-end system led to a dramatic increase in user interest and sales, with the proportion of users experiencing the system rising to over 70% by the end of the year [51][52]. Group 3: Strategic Shifts and Team Expansion - In 2023, Li Auto began to learn from Huawei's approach to autonomous driving, significantly expanding its engineering team from around 600 to over 1,000 by the end of the year [10][11]. - Despite the team expansion, initial results did not meet expectations, prompting a strategic pivot towards the end-to-end model [11][24]. - The end-to-end project was initiated with a small, dedicated team, emphasizing the importance of voluntary participation and commitment to the project's success [27][28]. Group 4: Technical Innovations and Efficiency - The end-to-end project was completed in approximately 100 days, showcasing an unprecedented speed of development in the industry, with no significant errors reported during the process [46][56]. - The project utilized a one-stage end-to-end technology, integrating various functions into a single network, which allowed for more efficient processing and reduced complexity compared to traditional modular approaches [58][59]. - The success of the project was attributed to effective collaboration among team members and a strong focus on data-driven methodologies, which allowed for high-quality outcomes with a relatively small team [57][64]. Group 5: Data-Driven Approach - The foundation of Li Auto's success in autonomous driving is rooted in a robust data collection and processing system established by the team, which has been in development since 2018 [72][73]. - The company has emphasized the importance of high-quality data over sheer model complexity, leading to significant improvements in performance metrics [70][72].
自动驾驶是否一定需要语言模型?
自动驾驶之心· 2025-11-05 00:04
Core Viewpoint - The article discusses the technological competition between two architectures for autonomous driving: WEWA (World Engine + World Action Model) represented by Huawei and VLA (Vision-Language-Action) pursued by companies like Li Auto and Xpeng. It highlights the debate on whether large language models (LLMs) are essential for autonomous driving, emphasizing the trade-offs between efficiency and cognitive depth in technology choices [2][4]. Summary by Sections 1. Technological Divergence: WEWA vs. VLA - The year 2025 is identified as a critical turning point for autonomous driving technology, with WEWA and VLA architectures representing opposing approaches. WEWA aims for efficient implementation through "de-linguistic" methods, while VLA focuses on cognitive intelligence via language models [2][4]. 2. Fundamental Differences Between WEWA and VLA - The two architectures differ fundamentally in their information processing logic, core components, and technical goals, particularly regarding the role of language as an intermediary. WEWA emphasizes direct mapping from visual data to actions, while VLA incorporates a three-tiered process involving visual features, language semantics, and control instructions [5][6]. 3. Cost of Language Models - VLA's reliance on large language models incurs significant computational costs, presenting a core bottleneck for mass production. The hardware costs escalate dramatically due to the need for high-performance GPU clusters during training and advanced chips for real-time inference [7][8][9]. 4. Advantages of Language Models - Despite high computational costs, VLA's rise is attributed to the abstracting capabilities and cognitive intelligence provided by language models. These models can compress numerous similar scenarios into concise language, enhancing decision-making in complex situations [10][12][13][14]. 5. Core Trade-offs: Efficiency vs. Intelligence - The necessity of language models in autonomous driving is debated, with no definitive conclusion. In short-term production scenarios (L2-L3), WEWA's efficiency and low latency are more valuable, while in long-term high-level scenarios (L4-L5), VLA's cognitive advantages become essential. The future may see a hybrid approach combining both architectures to balance efficiency and intelligence [15][16][17][18].
英伟达一篇长达41页的自驾VLA框架!因果链推理,实车可部署算法Alpamayo-R1
自动驾驶之心· 2025-11-05 00:04
Core Insights - The article discusses the introduction of the Alpamayo-R1 (AR1) framework by NVIDIA, which aims to enhance decision-making capabilities in complex driving scenarios through causal reasoning and trajectory planning [1][2]. Group 1: Background and Framework - The development of autonomous driving systems has shifted from traditional modular architectures to end-to-end frameworks, which are now widely recognized in the industry [3]. - Current end-to-end methods struggle with long-tail scenarios due to sparse supervisory signals and the need for high-order reasoning capabilities, highlighting a significant gap between existing models and the requirements for robust Level 4 (L4) autonomous driving [3][4]. Group 2: Innovations in AR1 - AR1 integrates causal chain reasoning with trajectory planning, resulting in a 12% increase in planning accuracy in high-difficulty scenarios compared to trajectory-based benchmark models [2][8]. - The model demonstrates a 35% reduction in lane deviation rates and a 25% decrease in near-collision rates during closed-loop simulations [2]. - After reinforcement learning post-training, the model's reasoning quality improved by 45%, and reasoning-action consistency increased by 37% [2]. Group 3: Causal Chain Dataset - The article introduces a structured causal chain (CoC) annotation framework that generates reasoning trajectories aligned with driving behavior, ensuring that each trajectory is decision-centric and causally linked [5][29]. - The CoC dataset is designed to provide clear supervision for learning decision causality, enabling the reasoning model to efficiently infer the reasons behind specific driving actions [31][42]. Group 4: Training Strategies - A multi-stage training strategy is employed, utilizing supervised fine-tuning and reinforcement learning to enhance reasoning capabilities and ensure consistency between reasoning and actions [8][12]. - The AR1 model is built on the Cosmos-Reason backbone, which is specifically designed for physical intelligence applications, enhancing its deployment capabilities in autonomous driving scenarios [16][17]. Group 5: Visual-Language-Action (VLA) Architecture - The AR1 architecture emphasizes modularity and flexibility, allowing it to integrate existing visual-language models while incorporating specialized components for efficient visual encoding and real-time action decoding [12][19]. - The model's design addresses the challenges of processing multi-camera inputs and generating precise multi-modal trajectory predictions necessary for safe vehicle control [11][12]. Group 6: Data Annotation and Quality Assurance - A hybrid annotation process combining human and automated labeling is implemented to ensure high-quality training data while maintaining efficiency [48][49]. - The quality assurance process includes multiple checks to ensure causal correctness and minimal decision-making ambiguity in the annotated data [52][53].
跨行转入自动驾驶大厂的经验分享
自动驾驶之心· 2025-11-04 00:03
Core Insights - The article emphasizes the importance of seizing opportunities and continuous learning in the rapidly evolving field of autonomous driving [1][4] - It highlights the creation of a comprehensive community platform, "Autonomous Driving Heart Knowledge Planet," aimed at facilitating knowledge sharing and career development in the autonomous driving sector [4][16] Group 1: Career Development - Transitioning to the autonomous driving industry can be successful through dedication and preparation, as illustrated by the experience of a professional who switched careers and excelled in various roles [1] - Continuous learning and adapting to industry trends are crucial for career advancement, as demonstrated by the professional's progression from algorithm evaluation to advanced safety algorithms [1] Group 2: Community and Resources - The "Autonomous Driving Heart Knowledge Planet" has over 4,000 members and aims to grow to nearly 10,000 in two years, providing a platform for discussion, technical sharing, and job opportunities [4][16] - The community offers a variety of resources, including video content, learning pathways, and Q&A sessions, to support both beginners and advanced learners in the autonomous driving field [7][10] Group 3: Technical Learning and Networking - The community organizes discussions with industry experts on various topics, including entry points for end-to-end autonomous driving and the integration of multi-sensor fusion [8][20] - Members have access to a wealth of technical routes and resources, including over 40 technical pathways and numerous datasets relevant to autonomous driving [10][36] Group 4: Job Opportunities - The community facilitates job referrals and connections with leading companies in the autonomous driving sector, enhancing members' chances of securing positions in the industry [11][12] - Regular updates on job openings and industry trends are provided, helping members stay informed about potential career advancements [21][93]
从DriveVLA-W0出发:探讨世界模型如何放大VLA的扩展定律(中科院)
自动驾驶之心· 2025-11-04 00:03
戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>直播和内容获取转到 → 自动驾驶之心知识星球 点击按钮预约直播 在自动驾驶领域,通过大规模数据来扩展视觉-语言-动作模型,为构建更通用的驾驶智能提供了一条充满前景的道路。然而,VLA模型一直面临" 监督缺失 "的问 题:其庞大的模型能力仅由稀疏、低维的动作信号进行监督,导致其大部分表征潜力未能得到充分利用。 为解决此问题,中科院和华为引望的团队提出了 DriveVLA-W0, 一种利用世界模型来预测未来图像的训练范式。 为验证DriveVLA-W0的通用性,本文在两种主流 VLA架构上展开验证:针对采用离散视觉token的VLA模型,设计自回归世界模型;针对基于连续视觉特征的VLA模型,设计扩散世界模型。基于世界建模学习到的 丰富表征,本文进一步引入轻量级动作专家(action expert),以解决实时部署中的推理耗时问题。 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 DriveVLA-W0: 利用世界模型放大VLA的 拓展定律 时间:11.4 / 19:30-20:30 直播简介 VLA模型是通向通用自动驾驶的希望路 径,却受限于"监督赤字": ...