Workflow
AdaThinkDrive
icon
Search documents
业内首个RL+VLA汇总:强化学习如何推动 VLA 走向真实世界?
自动驾驶之心· 2025-12-24 09:22
MindDrive WAM-Diff 论文标题 :MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning 论文链接 :https://arxiv.org/abs/2512.13636 项目主页 :https://xiaomi-mlab.github.io/MindDrive/ 提出机构 :华中科技大学、小米汽车 一句话总结 :为解决VLA模型在线强化学习中连续动作空间探索低效的问题,提出MindDrive框架,通过双专家(决策专家+动作专家)架构将动作空间转化为离 散语言决策空间,实现高效在线RL训练。 核心贡献 : 设计双LoRA适配器架构,决策专家负责场景推理与语言决策,动作专家将决策映射为可行轨迹,建立语言-动作动态映射。 构建基于CARLA模拟器的在线闭环RL框架,采用稀疏奖励与PPO算法,结合KL正则化避免灾难性遗忘。 在Bench2Drive基准上以轻量Qwen-0.5B模型实现78.04的驾驶分数与55.09%的成功率,超越同规模SOTA模型。 点击下方 ...
小米智驾正在迎头赶上......
自动驾驶之心· 2025-11-03 00:04
Core Insights - Xiaomi has made significant strides in the autonomous driving sector since the establishment of its automotive division in September 2021, with plans to release the Xiaomi SU7 in March 2024 and the YU7 in June 2025 [2] - The company is actively engaging in advanced research, with a focus on integrating cutting-edge technologies into its autonomous driving solutions, as evidenced by a substantial number of research papers published by its automotive team [2] Research Developments - The AdaThinkDrive framework introduces a dual-mode reasoning mechanism in end-to-end autonomous driving, achieving a PDMS score of 90.3 in NAVSIM benchmark tests, surpassing the best pure vision baseline by 1.7 points [6] - EvaDrive presents an evolutionary adversarial policy optimization framework that successfully addresses trajectory generation and evaluation challenges, achieving optimal performance in both NAVSIM and Bench2Drive benchmarks [9] - MTRDrive enhances visual-language models for motion risk prediction by introducing a memory-tool synergistic reasoning framework, significantly improving generalization capabilities in autonomous driving tasks [13][14] Performance Metrics - The AdaThinkDrive framework has shown a 14% improvement in reasoning efficiency while effectively distinguishing when to apply reasoning in various driving scenarios [6] - EvaDrive achieved a PDMS score of 94.9 in NAVSIM v1, outperforming other methods like DiffusionDrive and DriveSuprim [9] - The DriveMRP-Agent demonstrated a remarkable zero-shot evaluation accuracy of 68.50% on real-world high-risk datasets, significantly improving from a baseline of 29.42% [15] Framework Innovations - ReCogDrive combines cognitive reasoning with reinforcement learning to enhance decision-making in autonomous driving, achieving a PDMS of 90.8 in NAVSIM tests [18] - The AgentThink framework integrates dynamic tool invocation with chain-of-thought reasoning, improving reasoning scores by 53.91% and answer accuracy by 33.54% in benchmark tests [22] - ORION framework effectively aligns semantic reasoning with action generation, achieving a driving score of 77.74 and a success rate of 54.62% in Bench2Drive evaluations [23] Data Generation Techniques - Dream4Drive introduces a 3D perception-guided synthetic data generation framework, significantly enhancing the performance of perception tasks with minimal synthetic sample usage [26] - The Genesis framework achieves joint generation of multi-view driving videos and LiDAR point cloud sequences, enhancing the realism and utility of autonomous driving simulation data [41] - The Uni-Gaussians method unifies camera and LiDAR simulation, demonstrating superior simulation quality in dynamic driving scenarios [42]
纯视觉最新SOTA!AdaThinkDrive:更灵活的自动驾驶VLA思维链(清华&小米)
自动驾驶之心· 2025-09-18 23:33
Core Viewpoint - The article discusses the limitations of existing Chain-of-Thought (CoT) reasoning methods in Vision-Language-Action (VLA) models for autonomous driving, particularly in simple scenarios where they do not improve decision quality and introduce unnecessary computational overhead. It introduces AdaThinkDrive, a new VLA framework that employs a dual-mode reasoning mechanism inspired by the "fast and slow thinking" theory, allowing the model to adaptively choose when to reason based on scene complexity [3][4][10]. Group 1: Introduction and Background - The shift from traditional modular approaches to end-to-end architectures in autonomous driving systems is highlighted, noting that while modular methods offer flexibility, they suffer from information loss between components, leading to cumulative errors in complex scenarios. End-to-end methods mitigate this issue but are still limited by their reliance on supervised data [7]. - The article categorizes current VLA methods into two paradigms: meta-action methods focusing on high-level guidance and planning-based methods that predict trajectories directly from raw inputs. The application of CoT techniques is becoming more prevalent, particularly in complex scenarios, but their effectiveness in simple scenarios is questioned [14][15]. Group 2: AdaThinkDrive Framework - AdaThinkDrive is proposed as an end-to-end VLA framework that incorporates a "fast answer/slow thinking" mechanism, allowing the model to switch adaptively between direct prediction and explicit reasoning based on scene complexity. This is achieved through a three-stage adaptive reasoning strategy [11][18]. - The framework's performance is validated through extensive experiments on the Navsim benchmark, achieving a Predictive Driver Model Score (PDMS) of 90.3, which is 1.7 points higher than the best pure visual baseline model. The model demonstrates superior adaptive reasoning capabilities, selectively enabling CoT in 96% of complex scenarios and defaulting to direct trajectory prediction in 84% of simple scenarios [4][18][50]. Group 3: Experimental Results and Analysis - The article presents a comprehensive evaluation of AdaThinkDrive against existing models, showing that it outperforms both "always think" and "never think" baseline models, with PDMS improvements of 2.0 and 1.4 points, respectively. Additionally, the reasoning time is reduced by 14% compared to the "always think" baseline, indicating a balance between accuracy and efficiency [4][18][58]. - The results indicate that the optimal reasoning strategy is not universal but depends on scene complexity, emphasizing the need for models to adaptively enable reasoning based on the context [10][18]. Group 4: Conclusion - The article concludes that reasoning in simple scenarios often increases computational costs without enhancing decision quality. AdaThinkDrive addresses this by allowing agents to learn when to think, guided by an adaptive thinking reward mechanism. The experimental results on the NAVSIM benchmark demonstrate that AdaThinkDrive achieves state-of-the-art performance, underscoring the importance of adaptive thinking for accurate and efficient decision-making in autonomous driving systems [66].