自动驾驶论文速递 | DriveQA、闭环仿真、AIGC、世界模型等~

Core Insights - The article discusses the development of the DriveQA dataset, which integrates driving manuals from various U.S. states with visual scenarios from the CARLA simulation environment, creating a comprehensive driving rules question-answering benchmark with 474K samples [2][3] - It highlights the advantages of DriveQA over existing multimodal datasets in covering traffic rules and improving model generalization and reasoning capabilities [2][3] Contribution Summary DriveQA Multimodal Driving Knowledge Benchmark - DriveQA consists of two components: DriveQA-T with 26K QA pairs from 51 U.S. states covering 19 question categories, and DriveQA-V with 68K images and 448K QA pairs based on CARLA simulations, supporting various evaluation tasks [3] System Evaluation of SOTA Models - Testing on mainstream LLMs (e.g., GPT-4o, Llama-3.1) and MLLMs (e.g., LLaVA-1.5) revealed good performance on basic traffic rules but significant deficiencies in numerical reasoning, complex right-of-way scenarios, and understanding traffic sign variants [3] Model Optimization Value of DriveQA - Fine-tuning with LoRA on DriveQA significantly improved accuracy in recognizing regulatory signs and making intersection decisions, demonstrating effective generalization in downstream driving tasks [3] Analysis of Model Sensitivity and Generalization Limitations - The controlled variables in DriveQA-V revealed model sensitivity to environmental factors, and negative sampling exposed weaknesses in understanding complex rules, providing insights for optimizing rule reasoning in autonomous driving AI [3] Generative AI in Autonomous Driving Systems Testing - The article summarizes the application of generative AI in testing autonomous driving systems, categorizing existing research into six core tasks related to scenario-based testing [9][11] - It reviews various generative AI models used in testing, including LLMs, VLMs, diffusion models, GANs, and VAEs, detailing their mechanisms in different testing tasks [11][14] Evaluation Resources and Benchmark Integration - A comprehensive reference framework for datasets, simulators, ADS systems, evaluation metrics, and benchmark methods in the field of ADS testing is provided [14] Limitations and Future Directions - The article identifies 27 core limitations of generative AI in ADS testing, such as hallucination issues in LLMs and computational overhead in diffusion models, suggesting targeted improvement directions [14]