MindDrive
Search documents
业内首个RL+VLA汇总:强化学习如何推动 VLA 走向真实世界?
自动驾驶之心· 2025-12-24 09:22
Core Insights - The article discusses advancements in Vision-Language-Action (VLA) models for autonomous driving, highlighting a shift from traditional supervised learning methods to reinforcement learning (RL) approaches to enhance model generalization and reasoning capabilities [2]. Summary by Sections VLA + RL Research Overview - The article summarizes recent works in the VLA + RL domain, indicating a trend towards using RL to address limitations in previous models, particularly in terms of hallucination issues and the efficiency of continuous action space exploration [2]. Key Papers and Contributions - **MindDrive**: Introduces a framework that transforms action space into a discrete language decision space, achieving a driving score of 78.04 and a success rate of 55.09% on the Bench2Drive benchmark using a lightweight model [6]. - **WAM-Diff**: Proposes an end-to-end VLA framework that utilizes masked diffusion for trajectory optimization, achieving superior performance on the NAVSIM benchmark [7]. - **LCDrive**: Addresses temporal expression and latency issues in text chain reasoning by employing a latent chain-of-thought mechanism, demonstrating improved reasoning efficiency and trajectory quality [12]. - **Reasoning-VLA**: Develops a framework that enhances parallel trajectory generation through learnable action queries, achieving high performance across multiple datasets [13]. - **Alpamayo-R1**: Bridges reasoning and action prediction through a modular architecture and multi-stage training, improving generalization in long-tail scenarios [18]. - **AdaThinkDrive**: Introduces a dual-mode mechanism to balance decision accuracy and reasoning efficiency, achieving a PDMS score of 90.3 on the Navsim benchmark [20]. - **AutoDrive-R²**: Combines supervised fine-tuning and RL to enhance trajectory planning accuracy, achieving state-of-the-art performance with a significant reduction in error rates [25]. - **IRL-VLA**: Proposes a framework that avoids reliance on simulators by using a reward world model, achieving state-of-the-art performance on the NAVSIM v2 benchmark [31]. - **DriveAgent-R1**: Integrates active perception with hybrid thinking, achieving significant improvements in decision reliability and efficiency [32]. - **Drive-R1**: Connects reasoning and planning in VLMs, providing effective methods for integrating reasoning with motion planning [37]. - **ReCogDrive**: Merges cognitive reasoning with diffusion planners, achieving state-of-the-art performance while addressing the limitations of imitation learning [38].
华科&小米联合提出MindDrive:首个证实在线强化学习有效性的VLA框架......
自动驾驶之心· 2025-12-17 00:03
Core Insights - The article introduces MindDrive, a novel framework for autonomous driving that utilizes online reinforcement learning (RL) to enhance the performance of vision-language-action (VLA) models [2][4][44] - MindDrive demonstrates significant improvements in driving scores and success rates compared to traditional end-to-end paradigms and state-of-the-art (SOTA) models, achieving a driving score (DS) of 78.04 and a success rate (SR) of 55.09% [9][38] Background Review - Autonomous driving relies on models that can perceive, decide, and execute actions in dynamic environments. Traditional frameworks often lack common sense and causal reasoning capabilities [4] - Current VLA models primarily use imitation learning (IL), which can lead to causal confusion and distribution shifts, resulting in irreversible errors in closed-loop driving scenarios [4][5] MindDrive Framework - MindDrive consists of two main components: a decision expert and an action expert, both utilizing a shared vision encoder and text tokenizer, but differing in their low-rank adaptation (LoRA) parameters [11][18] - The decision expert generates abstract driving decisions based on navigation commands and visual inputs, while the action expert translates these decisions into specific action trajectories [11][18] Online Reinforcement Learning Approach - MindDrive employs online RL to optimize the decision-making process by sampling different trajectories and receiving feedback from the environment, thus enhancing the model's understanding of causal relationships [22][30] - The framework is designed to operate within a closed-loop simulation environment, specifically using the CARLA simulator, which allows for efficient data collection and training [8][24] Experimental Results - MindDrive outperforms traditional end-to-end methods and other VLA models, achieving a driving score that is 10.12 points higher than the best imitation learning model and 6.68 points higher than the best offline RL method [38][40] - The model's performance in complex driving scenarios, such as overtaking and yielding, shows significant improvements, indicating enhanced causal reasoning and decision robustness [38][40] Conclusion - MindDrive represents a significant advancement in the application of online RL to autonomous driving, providing a framework that effectively maps language instructions to actions while optimizing exploration efficiency [44] - The results suggest that MindDrive could inspire further developments in the autonomous driving sector, particularly in enhancing the capabilities of VLA models [44]