NavigScene

Search documents
小鹏超视距自动驾驶VLA是如何实现的?
自动驾驶之心· 2025-08-25 23:34
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 今天自动驾驶之心为大家分享 中佛罗里达大学和小鹏汽车ACMMM25中稿的最新 工作 - NavigScene ! 连接局部感知和全局导航,实现超视距 自动驾驶! 如果您有相关工作需要分享,请在文末联系我们! 自动驾驶课程学习与技术交流群事宜,也欢迎添加小助理微信AIDriver004做进一步咨询 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | Qucheng Peng等 编辑 | 自动驾驶之心 写在前面 & 笔者的个人理解 自动驾驶系统在基于局部视觉信息的感知、预测和规划方面取得了显著进展,但它们难以整合人类驾驶员通常使用的更广泛的导航背景。为此,小鹏汽车的团队 提出了NavigScene,期望解决局部传感器数据与全局导航信息之间的关键差距,NavigScene是一种辅助的导航引导自然语言数据集,可在自主驾驶系统中模拟类人 驾驶环境。此外开发了三种互补的方法来利用NavigScene:(1)导航引导推理,通过在提示方法中结合导航上下文来增强视觉-语言模型;(2)导航引导偏好优 化,这是一种强 ...
一文尽览!近一年自动驾驶VLA优秀工作汇总~
自动驾驶之心· 2025-07-15 12:30
Core Insights - The article discusses the advancements in Vision-Language-Action (VLA) models for autonomous driving, highlighting the integration of navigation and reinforcement learning to enhance reasoning capabilities beyond visual range [2][3][6]. Group 1: NavigScene - NavigScene is introduced as a novel auxiliary dataset that pairs local multi-view sensor inputs with global natural language navigation guidance, addressing the critical gap between local perception and global navigation context in autonomous driving [6]. - Three complementary paradigms are implemented in NavigScene: navigation-guided reasoning, navigation-guided preference optimization, and navigation-guided VLA models, enhancing the reasoning and generalization capabilities of autonomous driving systems [6]. - Comprehensive experiments demonstrate significant performance improvements in perception, prediction, and planning tasks by integrating global navigation knowledge into autonomous driving systems [6]. Group 2: AutoVLA - AutoVLA is proposed as an end-to-end autonomous driving framework that integrates physical action tokens with a pre-trained VLM backbone, enabling direct policy learning and semantic reasoning from raw visual observations and language instructions [12]. - A reinforcement learning-based post-training method using Group Relative Policy Optimization (GRPO) is introduced to achieve adaptive reasoning and further enhance model performance in end-to-end driving tasks [12]. - AutoVLA achieves competitive performance across multiple autonomous driving benchmarks, including open-loop and closed-loop tests [12]. Group 3: ReCogDrive - ReCogDrive is presented as an end-to-end autonomous driving system that integrates VLM with a diffusion planner, employing a three-stage training paradigm to address performance drops in rare and long-tail scenarios [13][16]. - The first stage involves fine-tuning the VLM on a large-scale driving Q&A dataset to mitigate domain gaps between general content and real-world driving scenarios [16]. - The method achieves a state-of-the-art PDMS score of 89.6 on the NAVSIM benchmark, highlighting its effectiveness and feasibility [16]. Group 4: Impromptu VLA - Impromptu VLA introduces a large-scale, richly annotated dataset aimed at addressing the limitations of existing benchmarks in autonomous driving VLA models [22]. - The dataset is designed to enhance the performance of VLA models in unstructured extreme scenarios, demonstrating significant improvements in established benchmarks [22]. - Experiments show that training with the Impromptu VLA dataset leads to notable performance enhancements in closed-loop NeuroNCAP scores and collision rates [22]. Group 5: DriveMoE - DriveMoE is a novel end-to-end autonomous driving framework that incorporates a mixture-of-experts (MoE) architecture to effectively handle multi-view sensor data and complex driving scenarios [28]. - The framework features scene-specific visual MoE and skill-specific action MoE, addressing the challenges of multi-view redundancy and skill specialization [28]. - DriveMoE achieves state-of-the-art performance in closed-loop evaluations on the Bench2Drive benchmark, demonstrating the effectiveness of combining visual and action MoE in autonomous driving tasks [28].
小鹏最新!NavigScene:全局导航实现超视距自动驾驶VLA(ACMMM'25)
自动驾驶之心· 2025-07-14 11:30
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 今天自动驾驶之心为大家分享 中佛罗里达大学和小鹏汽车ACMMM25中稿的最新 工作 - NavigScene ! 连接局部感知和全局导航,实现超视距自动驾驶! 如果您有 相关工作需要分享,请在文末联系我们! 自动驾驶课程学习与技术交流群事宜,也欢迎添加小助理微信AIDriver004做进一 步咨询 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | Qucheng Peng等 编辑 | 自动驾驶之心 写在前面 & 笔者的个人理解 自动驾驶系统在基于局部视觉信息的感知、预测和规划方面取得了显著进展,但它们难以整合人类驾驶员 通常使用的更广泛的导航背景。为此,小鹏汽车的团队提出了NavigScene,期望解决局部传感器数据与全 局导航信息之间的关键差距,NavigScene是一种辅助的导航引导自然语言数据集,可在自主驾驶系统中模 拟类人驾驶环境。此外开发了三种互补的方法来利用NavigScene:(1)导航引导推理,通过在提示方法中 结合导航上下文来增强视觉-语言模型;(2)导航引导偏好优化,这是一 ...