小鹏最新！NavigScene：全局导航实现超视距自动驾驶VLA（ACMMM'25）

Core Insights - The article discusses the development of NavigScene, a novel dataset aimed at bridging the gap between local perception and global navigation in autonomous driving systems, enhancing their reasoning and planning capabilities [2][12][14]. Group 1: Overview of NavigScene - NavigScene is designed to integrate local sensor data with global navigation context, addressing the limitations of existing autonomous driving models that primarily rely on immediate visual information [5][9]. - The dataset includes two subsets: NavigScene-nuScenes and NavigScene-NAVSIM, which provide paired data to facilitate comprehensive scene understanding and decision-making [9][14]. Group 2: Methodologies - Three complementary paradigms are proposed to leverage NavigScene: 1. Navigation-guided reasoning (NSFT) enhances visual-language models by incorporating navigation context [10][19]. 2. Navigation-guided preference optimization (NPO) improves generalization in new scenarios through reinforcement learning [24][26]. 3. Navigation-guided visual-language-action (NVLA) model integrates navigation guidance with traditional driving models for better performance [27][28]. Group 3: Experimental Results - Experiments demonstrate that integrating global navigation knowledge significantly improves the performance of autonomous driving systems in tasks such as perception, prediction, and planning [12][34][39]. - The results indicate that models trained with NavigScene outperform baseline models across various metrics, including BLEU-4, METEOR, and CIDEr, showcasing enhanced reasoning capabilities [32][34]. Group 4: Practical Implications - The integration of NavigScene allows autonomous systems to make more informed decisions in complex driving environments, leading to improved safety and reliability [12][42]. - The findings highlight the importance of incorporating beyond-visual-range (BVR) knowledge for effective navigation and planning in autonomous driving applications [8][12].