Workflow
NavigScene
icon
Search documents
自动驾驶超视距VLA如何实现?小鹏NavigScene另辟蹊径!
自动驾驶之心· 2025-09-04 23:33
Core Viewpoint - The article discusses the limitations of current autonomous driving systems in bridging the gap between local perception and global navigation, highlighting the introduction of NavigScene as a solution to enhance navigation capabilities in autonomous vehicles [3][4]. Group 1: Research and Development - Autonomous driving systems have made significant progress in local visual information processing, but they struggle to integrate broader navigation context used by human drivers [4][9]. - NavigScene is introduced as a navigation-guided natural language dataset that simulates a human-like driving environment within autonomous systems [5][9]. - The development of three complementary paradigms utilizing NavigScene aims to improve reasoning, preference optimization, and the integration of visual-language-action models [5][9]. Group 2: Methodologies - Navigation-guided reasoning enhances visual language models by incorporating navigation context into prompting methods [5]. - Navigation-guided preference optimization is a reinforcement learning approach that improves visual language model responses by establishing preference relationships based on navigation-related information [5]. - The navigation-guided vision-language-action model integrates navigation guidance and visual language models with traditional end-to-end driving models through feature fusion [5]. Group 3: Event and Engagement - A live session is scheduled to discuss the advancements and methodologies related to NavigScene, emphasizing its role in overcoming the limitations of current autonomous driving systems [4][9].
小鹏超视距自动驾驶VLA是如何实现的?
自动驾驶之心· 2025-08-25 23:34
Core Viewpoint - The article discusses the development of NavigScene, a novel dataset and methodology by Xiaopeng Motors and the University of Central Florida, aimed at bridging the gap between local perception and global navigation in autonomous driving systems, enhancing their reasoning and planning capabilities in complex environments [3][9][10]. Group 1: Overview of NavigScene - NavigScene is designed to integrate local sensor data with global navigation context, addressing the limitations of existing autonomous driving systems that primarily rely on immediate visual information [3][5]. - The dataset includes two subsets: NavigScene-nuScenes and NavigScene-NAVSIM, which provide paired data of multi-view sensor inputs and corresponding natural language navigation instructions [9][14]. Group 2: Methodologies - Three complementary methodologies are proposed to utilize NavigScene: 1. Navigation-guided reasoning (NSFT) enhances visual-language models by incorporating navigation context [10][20]. 2. Navigation-guided preference optimization (NPO) improves the generalization of visual-language models in new navigation scenarios [24][26]. 3. Navigation-guided visual-language-action (NVLA) model integrates navigation guidance with traditional driving models for better performance in perception, prediction, and planning tasks [27][29]. Group 3: Experimental Results - Experiments demonstrate that integrating NavigScene significantly improves the performance of visual-language models in various driving-related tasks, including reasoning and planning [31][35]. - The results indicate that the combination of NSFT and NPO leads to notable enhancements in the models' ability to handle complex driving scenarios, reducing collision rates and improving trajectory accuracy [43][47].
一文尽览!近一年自动驾驶VLA优秀工作汇总~
自动驾驶之心· 2025-07-15 12:30
Core Insights - The article discusses the advancements in Vision-Language-Action (VLA) models for autonomous driving, highlighting the integration of navigation and reinforcement learning to enhance reasoning capabilities beyond visual range [2][3][6]. Group 1: NavigScene - NavigScene is introduced as a novel auxiliary dataset that pairs local multi-view sensor inputs with global natural language navigation guidance, addressing the critical gap between local perception and global navigation context in autonomous driving [6]. - Three complementary paradigms are implemented in NavigScene: navigation-guided reasoning, navigation-guided preference optimization, and navigation-guided VLA models, enhancing the reasoning and generalization capabilities of autonomous driving systems [6]. - Comprehensive experiments demonstrate significant performance improvements in perception, prediction, and planning tasks by integrating global navigation knowledge into autonomous driving systems [6]. Group 2: AutoVLA - AutoVLA is proposed as an end-to-end autonomous driving framework that integrates physical action tokens with a pre-trained VLM backbone, enabling direct policy learning and semantic reasoning from raw visual observations and language instructions [12]. - A reinforcement learning-based post-training method using Group Relative Policy Optimization (GRPO) is introduced to achieve adaptive reasoning and further enhance model performance in end-to-end driving tasks [12]. - AutoVLA achieves competitive performance across multiple autonomous driving benchmarks, including open-loop and closed-loop tests [12]. Group 3: ReCogDrive - ReCogDrive is presented as an end-to-end autonomous driving system that integrates VLM with a diffusion planner, employing a three-stage training paradigm to address performance drops in rare and long-tail scenarios [13][16]. - The first stage involves fine-tuning the VLM on a large-scale driving Q&A dataset to mitigate domain gaps between general content and real-world driving scenarios [16]. - The method achieves a state-of-the-art PDMS score of 89.6 on the NAVSIM benchmark, highlighting its effectiveness and feasibility [16]. Group 4: Impromptu VLA - Impromptu VLA introduces a large-scale, richly annotated dataset aimed at addressing the limitations of existing benchmarks in autonomous driving VLA models [22]. - The dataset is designed to enhance the performance of VLA models in unstructured extreme scenarios, demonstrating significant improvements in established benchmarks [22]. - Experiments show that training with the Impromptu VLA dataset leads to notable performance enhancements in closed-loop NeuroNCAP scores and collision rates [22]. Group 5: DriveMoE - DriveMoE is a novel end-to-end autonomous driving framework that incorporates a mixture-of-experts (MoE) architecture to effectively handle multi-view sensor data and complex driving scenarios [28]. - The framework features scene-specific visual MoE and skill-specific action MoE, addressing the challenges of multi-view redundancy and skill specialization [28]. - DriveMoE achieves state-of-the-art performance in closed-loop evaluations on the Bench2Drive benchmark, demonstrating the effectiveness of combining visual and action MoE in autonomous driving tasks [28].
小鹏最新!NavigScene:全局导航实现超视距自动驾驶VLA(ACMMM'25)
自动驾驶之心· 2025-07-14 11:30
Core Insights - The article discusses the development of NavigScene, a novel dataset aimed at bridging the gap between local perception and global navigation in autonomous driving systems, enhancing their reasoning and planning capabilities [2][12][14]. Group 1: Overview of NavigScene - NavigScene is designed to integrate local sensor data with global navigation context, addressing the limitations of existing autonomous driving models that primarily rely on immediate visual information [5][9]. - The dataset includes two subsets: NavigScene-nuScenes and NavigScene-NAVSIM, which provide paired data to facilitate comprehensive scene understanding and decision-making [9][14]. Group 2: Methodologies - Three complementary paradigms are proposed to leverage NavigScene: 1. Navigation-guided reasoning (NSFT) enhances visual-language models by incorporating navigation context [10][19]. 2. Navigation-guided preference optimization (NPO) improves generalization in new scenarios through reinforcement learning [24][26]. 3. Navigation-guided visual-language-action (NVLA) model integrates navigation guidance with traditional driving models for better performance [27][28]. Group 3: Experimental Results - Experiments demonstrate that integrating global navigation knowledge significantly improves the performance of autonomous driving systems in tasks such as perception, prediction, and planning [12][34][39]. - The results indicate that models trained with NavigScene outperform baseline models across various metrics, including BLEU-4, METEOR, and CIDEr, showcasing enhanced reasoning capabilities [32][34]. Group 4: Practical Implications - The integration of NavigScene allows autonomous systems to make more informed decisions in complex driving environments, leading to improved safety and reliability [12][42]. - The findings highlight the importance of incorporating beyond-visual-range (BVR) knowledge for effective navigation and planning in autonomous driving applications [8][12].