小鹏超视距自动驾驶VLA是如何实现的？

Core Viewpoint - The article discusses the development of NavigScene, a novel dataset and methodology by Xiaopeng Motors and the University of Central Florida, aimed at bridging the gap between local perception and global navigation in autonomous driving systems, enhancing their reasoning and planning capabilities in complex environments [3][9][10]. Group 1: Overview of NavigScene - NavigScene is designed to integrate local sensor data with global navigation context, addressing the limitations of existing autonomous driving systems that primarily rely on immediate visual information [3][5]. - The dataset includes two subsets: NavigScene-nuScenes and NavigScene-NAVSIM, which provide paired data of multi-view sensor inputs and corresponding natural language navigation instructions [9][14]. Group 2: Methodologies - Three complementary methodologies are proposed to utilize NavigScene: 1. Navigation-guided reasoning (NSFT) enhances visual-language models by incorporating navigation context [10][20]. 2. Navigation-guided preference optimization (NPO) improves the generalization of visual-language models in new navigation scenarios [24][26]. 3. Navigation-guided visual-language-action (NVLA) model integrates navigation guidance with traditional driving models for better performance in perception, prediction, and planning tasks [27][29]. Group 3: Experimental Results - Experiments demonstrate that integrating NavigScene significantly improves the performance of visual-language models in various driving-related tasks, including reasoning and planning [31][35]. - The results indicate that the combination of NSFT and NPO leads to notable enhancements in the models' ability to handle complex driving scenarios, reducing collision rates and improving trajectory accuracy [43][47].