Workflow
SURDS数据集
icon
Search documents
NeurIPS 2025 | SURDS 数据集与 GRPO 全面强化自驾空间推理
自动驾驶之心· 2025-09-27 23:33
Core Insights - The article discusses the challenges of achieving accurate spatial reasoning in autonomous driving scenarios using Vision Language Models (VLMs), highlighting the lack of large-scale benchmarks in this area [2][20]. - A new benchmark called SURDS has been introduced to systematically evaluate the spatial reasoning capabilities of VLMs, revealing significant shortcomings in current models [4][20]. Benchmark Overview - SURDS is a large-scale benchmark based on the nuScenes dataset, consisting of 41,080 visual-question training instances and 9,250 evaluation samples, covering six spatial categories: direction recognition, pixel-level localization, depth estimation, distance comparison, left-right ordering, and front-back relationships [4][20]. - The dataset includes diverse multimodal information collected from urban environments in Boston and Singapore, ensuring a realistic testing scenario [6][20]. Model Training and Evaluation - The research emphasizes the importance of data generation and introduces a novel automated process for generating high-quality reasoning chains, which enhances the model's spatial reasoning capabilities [8][10]. - A reinforcement learning framework combining spatial localization rewards and logical consistency objectives was designed, leading to significant performance improvements in various tasks [11][20]. Experimental Results - The evaluation results show that different models exhibit notable differences in spatial reasoning tasks, with the proposed model achieving a nearly 60% improvement in depth estimation accuracy compared to the second-best model [14][20]. - The study reveals that most existing models struggle with single-object tasks, often performing close to random levels, indicating a need for better learning of absolute pose and metric information [16][20]. Training Strategy Insights - Ablation studies indicate that combining localization and logical rewards significantly enhances model performance, underscoring the foundational role of localization ability in spatial reasoning [16][18]. - The research also highlights that the scale of model parameters does not directly correlate with spatial understanding capabilities, suggesting that simply increasing model size is insufficient [16][20].