Workflow
Autonomous Driving Multi-modal Large Model
icon
Search documents
VLM还是VLA?从现有工作看自动驾驶多模态大模型的发展趋势~
自动驾驶之心· 2025-08-20 23:33
Core Insights - The article emphasizes the increasing importance of foundational models such as LLM (Large Language Models), VLM (Vision-Language Models), and VLA (Vision-Language-Action Models) in autonomous driving decision-making, attracting significant attention from both academia and industry [2]. Summary by Categories LLM-Based Approaches - LLM-based methods leverage the reasoning capabilities of large models to describe autonomous driving, marking the early stages of integration between autonomous driving and large models [4]. - Notable research includes: - "Distilling Multi-modal Large Language Models for Autonomous Driving" - "LearningFlow: Automated Policy Learning Workflow for Urban Driving with Large Language Models" - "CoT-Drive: Efficient Motion Forecasting for Autonomous Driving with LLMs and Chain-of-Thought Prompting" - "PADriver: Towards Personalized Autonomous Driving" [4][5]. VLM-Based Approaches - VLM and VLA algorithms are currently mainstream due to the reliance on visual sensors in autonomous driving. The article summarizes the latest works in this area for reference and learning [8]. - Key studies include: - "Drive-R1: Bridging Reasoning and Planning in VLMs for Autonomous Driving with Reinforcement Learning" - "FutureSightDrive: Visualizing Trajectory Planning with Spatio-Temporal CoT for Autonomous Driving" [8][9]. VLA-Based Approaches - VLA methods focus on integrating vision, language, and action for end-to-end autonomous driving, emphasizing adaptive reasoning and reinforcement fine-tuning [17]. - Significant contributions include: - "AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning" - "DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving" [17][21].