华科&小米联合提出MindDrive:首个证实在线强化学习有效性的VLA框架......
自动驾驶之心·2025-12-17 00:03

Core Insights - The article introduces MindDrive, a novel framework for autonomous driving that utilizes online reinforcement learning (RL) to enhance the performance of vision-language-action (VLA) models [2][4][44] - MindDrive demonstrates significant improvements in driving scores and success rates compared to traditional end-to-end paradigms and state-of-the-art (SOTA) models, achieving a driving score (DS) of 78.04 and a success rate (SR) of 55.09% [9][38] Background Review - Autonomous driving relies on models that can perceive, decide, and execute actions in dynamic environments. Traditional frameworks often lack common sense and causal reasoning capabilities [4] - Current VLA models primarily use imitation learning (IL), which can lead to causal confusion and distribution shifts, resulting in irreversible errors in closed-loop driving scenarios [4][5] MindDrive Framework - MindDrive consists of two main components: a decision expert and an action expert, both utilizing a shared vision encoder and text tokenizer, but differing in their low-rank adaptation (LoRA) parameters [11][18] - The decision expert generates abstract driving decisions based on navigation commands and visual inputs, while the action expert translates these decisions into specific action trajectories [11][18] Online Reinforcement Learning Approach - MindDrive employs online RL to optimize the decision-making process by sampling different trajectories and receiving feedback from the environment, thus enhancing the model's understanding of causal relationships [22][30] - The framework is designed to operate within a closed-loop simulation environment, specifically using the CARLA simulator, which allows for efficient data collection and training [8][24] Experimental Results - MindDrive outperforms traditional end-to-end methods and other VLA models, achieving a driving score that is 10.12 points higher than the best imitation learning model and 6.68 points higher than the best offline RL method [38][40] - The model's performance in complex driving scenarios, such as overtaking and yielding, shows significant improvements, indicating enhanced causal reasoning and decision robustness [38][40] Conclusion - MindDrive represents a significant advancement in the application of online RL to autonomous driving, providing a framework that effectively maps language instructions to actions while optimizing exploration efficiency [44] - The results suggest that MindDrive could inspire further developments in the autonomous driving sector, particularly in enhancing the capabilities of VLA models [44]

华科&小米联合提出MindDrive:首个证实在线强化学习有效性的VLA框架...... - Reportify