南洋理工&哈佛提出OpenREAD：端到端RL统一认知与轨迹规划

Core Viewpoint - The article discusses the introduction of OpenREAD, a new framework developed by Nanyang Technological University and Harvard University, which utilizes reinforcement learning (RL) to enhance the reasoning capabilities of visual language models (VLM) in the context of autonomous driving [4][28]. Group 1: Methodology - OpenREAD incorporates Qwen3-LLM as an "evaluation expert," expanding the application of RL from traditional verifiable downstream tasks to open-ended tasks such as "driving suggestions" and "scene analysis," achieving end-to-end reinforcement fine-tuning from high-level semantic reasoning to low-level trajectory planning [6][28]. - The framework addresses the challenge of designing reward functions for open-ended driving knowledge learning, where multiple expressions can represent the same reference answer, complicating the RL process [7]. - Two preparatory steps were taken: (1) Constructing knowledge data with explicit chains of thought (CoT) using GPT-4 to annotate driving knowledge data covering perception and decision-making tasks [8]; (2) Converting the OmniDrive dataset into a format suitable for RL training, structured as "thinking + answering" [9]. Group 2: Experimental Results - OpenREAD was evaluated on the LingoQA and NuScenes datasets, demonstrating superior performance compared to traditional supervised fine-tuning (SFT) methods in trajectory error, collision rates, and knowledge evaluation metrics [19][20]. - The results indicate that the introduction of driving knowledge significantly enhances the effectiveness of RL fine-tuning, as evidenced by improvements in trajectory error and collision rates [19][20]. - In comparison with existing methods, OpenREAD exhibited better collision control capabilities, ensuring safer driving outcomes [20]. Group 3: Conclusion - OpenREAD successfully implements collaborative reinforcement learning fine-tuning for driving knowledge and trajectory planning, expanding the boundaries of RL applications in end-to-end autonomous driving [28].