超越ORION!CoT4AD:显式思维链推理VLA模型(北大最新)
自动驾驶之心·2025-12-02 00:03

Core Insights - The article introduces CoT4AD, a new Vision-Language-Action (VLA) framework designed to enhance logical and causal reasoning capabilities in autonomous driving scenarios, addressing limitations in existing VLA models [1][3][10]. Background Review - Autonomous driving is a key research area in AI and robotics, promising improvements in traffic safety and efficiency, and playing a crucial role in smart city and intelligent transportation system development [2]. - Traditional modular architectures in autonomous driving face challenges such as error accumulation and limited generalization, leading to the emergence of end-to-end paradigms that utilize unified learning frameworks [2][3]. CoT4AD Framework - CoT4AD integrates chain-of-thought reasoning into end-to-end autonomous driving, allowing for explicit or implicit reasoning through a series of downstream tasks tailored for driving scenarios [3][10]. - The framework combines perception, language reasoning, future prediction, and trajectory planning, enabling the generation of explicit reasoning steps [6][10]. Experimental Results - CoT4AD was evaluated on the nuScenes and Bench2Drive datasets, achieving state-of-the-art performance in both open-loop and closed-loop assessments, outperforming existing LLM-based and end-to-end methods [10][19]. - In the nuScenes dataset, CoT4AD achieved L2 distance errors of 0.12m, 0.24m, and 0.53m at 1s, 2s, and 3s respectively, with an average collision rate of 0.10% [17][18]. Contributions of CoT4AD - The model's design allows for robust multi-task processing and future trajectory prediction, leveraging a diffusion model integrated with chain-of-thought reasoning [10][12]. - CoT4AD demonstrates superior performance in complex driving scenarios, enhancing decision-making consistency and reliability across diverse environments [19][23]. Ablation Studies - The effectiveness of various components, such as perception tokenizers and the chain-of-thought design, was validated through ablation studies, showing significant performance improvements when these elements were included [26][28]. - The model's ability to predict future scenarios was found to be crucial, with optimal performance achieved when predicting four future scenarios [29]. Conclusion - CoT4AD represents a significant advancement in autonomous driving technology, demonstrating enhanced reasoning capabilities and superior performance compared to existing methods, while also highlighting areas for future research to improve computational efficiency [30][32].