AdaThinkDrive
Search documents
小米智驾正在迎头赶上......
自动驾驶之心· 2025-11-03 00:04
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 21年9月小米汽车成立,24年3月小米SU7发布,25年6月YU7发布。短短四年时间,小米汽车已经在新能源的红海赛道中杀出了自己的路。25年下半年,各家新势 力都在卷智驾、卷性价比、卷冰箱彩电大沙发的时候, 小米智驾也在悄悄迎头赶上,据说新的版本也快和大家见面了。 一个非常明显的信号便是今年小米汽车团队的论文工作颇丰,涉及VLA、世界模型、端到端等多个方面。像ORION、WorldSplat、EvaDrive、Dream4Drive等等工作 业内关注都很多,小米汽车也一直在探索怎样的生成模型能在自动驾驶里面真正的落地应用。一个合理的猜测,小米新版本的量产方案会和最前沿的技术结合的比 较紧密。 PS.也推荐下我们前面总结的地平线和理想智驾的工作汇总。 2025年的理想还在不断突破,年度成果一览 从地平线自动驾驶2025年的工作,我们看到了HSD的野心 VLM&VLA AdaThinkDrive AdaThinkDrive: Adaptive Thinking ...
纯视觉最新SOTA!AdaThinkDrive:更灵活的自动驾驶VLA思维链(清华&小米)
自动驾驶之心· 2025-09-18 23:33
Core Viewpoint - The article discusses the limitations of existing Chain-of-Thought (CoT) reasoning methods in Vision-Language-Action (VLA) models for autonomous driving, particularly in simple scenarios where they do not improve decision quality and introduce unnecessary computational overhead. It introduces AdaThinkDrive, a new VLA framework that employs a dual-mode reasoning mechanism inspired by the "fast and slow thinking" theory, allowing the model to adaptively choose when to reason based on scene complexity [3][4][10]. Group 1: Introduction and Background - The shift from traditional modular approaches to end-to-end architectures in autonomous driving systems is highlighted, noting that while modular methods offer flexibility, they suffer from information loss between components, leading to cumulative errors in complex scenarios. End-to-end methods mitigate this issue but are still limited by their reliance on supervised data [7]. - The article categorizes current VLA methods into two paradigms: meta-action methods focusing on high-level guidance and planning-based methods that predict trajectories directly from raw inputs. The application of CoT techniques is becoming more prevalent, particularly in complex scenarios, but their effectiveness in simple scenarios is questioned [14][15]. Group 2: AdaThinkDrive Framework - AdaThinkDrive is proposed as an end-to-end VLA framework that incorporates a "fast answer/slow thinking" mechanism, allowing the model to switch adaptively between direct prediction and explicit reasoning based on scene complexity. This is achieved through a three-stage adaptive reasoning strategy [11][18]. - The framework's performance is validated through extensive experiments on the Navsim benchmark, achieving a Predictive Driver Model Score (PDMS) of 90.3, which is 1.7 points higher than the best pure visual baseline model. The model demonstrates superior adaptive reasoning capabilities, selectively enabling CoT in 96% of complex scenarios and defaulting to direct trajectory prediction in 84% of simple scenarios [4][18][50]. Group 3: Experimental Results and Analysis - The article presents a comprehensive evaluation of AdaThinkDrive against existing models, showing that it outperforms both "always think" and "never think" baseline models, with PDMS improvements of 2.0 and 1.4 points, respectively. Additionally, the reasoning time is reduced by 14% compared to the "always think" baseline, indicating a balance between accuracy and efficiency [4][18][58]. - The results indicate that the optimal reasoning strategy is not universal but depends on scene complexity, emphasizing the need for models to adaptively enable reasoning based on the context [10][18]. Group 4: Conclusion - The article concludes that reasoning in simple scenarios often increases computational costs without enhancing decision quality. AdaThinkDrive addresses this by allowing agents to learn when to think, guided by an adaptive thinking reward mechanism. The experimental results on the NAVSIM benchmark demonstrate that AdaThinkDrive achieves state-of-the-art performance, underscoring the importance of adaptive thinking for accurate and efficient decision-making in autonomous driving systems [66].