理想最新DriveAction：探索VLA模型中类人驾驶决策的基准~

Core Insights - The article discusses the introduction of the DriveAction benchmark, specifically designed for Vision-Language-Action (VLA) models in autonomous driving, addressing existing limitations in current datasets and evaluation frameworks [2][3][20]. Group 1: Research Background and Issues - The development of VLA models presents new opportunities for autonomous driving systems, but current benchmark datasets lack diversity in scenarios, reliable action-level annotations, and evaluation protocols aligned with human preferences [2]. - Existing benchmarks primarily rely on open-source data, which limits their ability to cover complex real-world driving scenarios, leading to a disconnect between evaluation results and actual deployment risks [3]. Group 2: DriveAction Benchmark Innovations - DriveAction is the first action-driven benchmark specifically designed for VLA models, featuring three core innovations: 1. Comprehensive coverage of diverse driving scenarios sourced from real-world data collected by production autonomous vehicles across 148 cities in China [5]. 2. Realistic action annotations derived from users' real-time driving operations, ensuring accurate capture of driver intentions [6]. 3. A tree-structured evaluation framework based on action-driven dynamics, integrating visual and language tasks to assess model decision-making in realistic contexts [7]. Group 3: Evaluation Results - Experimental results indicate that models perform best in the full process mode (V-L-A) and worst in the no-information mode (A), with average accuracy dropping by 3.3% without visual input and 4.1% without language input [14]. - Specific task evaluations reveal that models excel in dynamic and static obstacle tasks but struggle with navigation and traffic light tasks, highlighting areas for improvement [16][17]. Group 4: Significance and Value of DriveAction - The introduction of the DriveAction benchmark marks a significant advancement in the evaluation of autonomous driving systems, providing a more comprehensive and realistic assessment tool that can help identify model bottlenecks and guide system optimization [20].