Workflow
OpenDriveVLA
icon
Search documents
后端到端时代:我们必须寻找新的道路吗?
自动驾驶之心· 2025-09-01 23:32
以下文章来源于自动驾驶下半场 ,作者厘米 本文只做学术分享,如有侵权,联系删文 2025年夏天,上一年端到端的热度还没有散去。 即使端到端的概念在各家各显神通的公关稿下已经面目全非;即使面对突然发布的 FSD 的正面竞争,我们还没有获得明显的优势;似乎一夜之间,VLA 的宣传攻 势像素级复制去年的端到端。 自动驾驶下半场 . 自动驾驶量产相关的一切 作者 | 厘米 来源 | 自动驾驶下半场 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 不过,这次似乎和端到端浪潮不太一样。相较于之前行业普遍达成研发共识的局面,这一次不少团队选择了刻意回避。 毕竟,技术切换期是最好的占领用户心智的机会,也是证明团队研发优势的最佳时机。 被公众认为长期处于第一梯队的华为 ADS 明确表示,WA(World Model + Action)才是实现自动驾驶的终极方案;蔚来在低速场景用世界模型展示了一些类似 VLA 的体验,但在对外宣传时却讳莫如深;最近开始用户体验的地平线,虽然表现惊艳,却只强调自己在认真做端到端,对于 VLA ...
自动驾驶VLA:OpenDriveVLA、AutoVLA
自动驾驶之心· 2025-08-18 01:32
Core Insights - The article discusses two significant papers, OpenDriveVLA and AutoVLA, which focus on applying large visual-language models (VLM) to end-to-end autonomous driving, highlighting their distinct technical paths and philosophies [22]. Group 1: OpenDriveVLA - OpenDriveVLA aims to address the "modal gap" in traditional VLMs when dealing with dynamic 3D driving environments, emphasizing the need for structured understanding of the 3D world [23]. - The methodology includes several key steps: 3D visual environment perception, visual-language hierarchical alignment, and a multi-stage training paradigm [24][25]. - The model utilizes structured, layered tokens (Agent, Map, Scene) to enhance the VLM's understanding of the environment, which helps mitigate spatial hallucination risks [6][9]. - OpenDriveVLA achieved state-of-the-art performance in the nuScenes open-loop planning benchmark, demonstrating its effective perception-based anchoring strategy [10][20]. Group 2: AutoVLA - AutoVLA focuses on integrating driving tasks into the native operation of VLMs, transforming them from scene narrators to genuine decision-makers [26]. - The methodology features layered visual token extraction, where the model creates discrete action codes instead of continuous coordinates, thus converting trajectory planning into a next-token prediction task [14][29]. - The model employs a dual-mode thinking approach, allowing it to adapt its reasoning depth based on scene complexity, balancing efficiency and effectiveness [28]. - AutoVLA's reinforcement learning fine-tuning (RFT) enhances its driving strategy, enabling the model to optimize its behavior actively rather than merely imitating human driving [30][35]. Group 3: Comparative Analysis - OpenDriveVLA emphasizes perception-language alignment to improve VLM's understanding of the 3D world, while AutoVLA focuses on language-decision integration to enhance VLM's decision-making capabilities [32]. - The two models represent complementary approaches: OpenDriveVLA provides a robust perception foundation, while AutoVLA optimizes decision-making strategies through reinforcement learning [34]. - Future models may combine the strengths of both approaches, utilizing OpenDriveVLA's structured perception and AutoVLA's action tokenization and reinforcement learning to create a powerful autonomous driving system [36].
自动驾驶端到端VLA落地,算法如何设计?
自动驾驶之心· 2025-06-22 14:09
Core Insights - The article discusses the rapid advancements in end-to-end autonomous driving, particularly focusing on Vision-Language-Action (VLA) models and their applications in the industry [2][3]. Group 1: VLA Model Developments - The introduction of AutoVLA, a new VLA model that integrates reasoning and action generation for end-to-end autonomous driving, shows promising results in semantic reasoning and trajectory planning [3][4]. - ReCogDrive, another VLA model, addresses performance issues in rare and long-tail scenarios by utilizing a three-stage training framework that combines visual language models with diffusion planners [7][9]. - Impromptu VLA introduces a dataset aimed at improving VLA models' performance in unstructured extreme conditions, demonstrating significant performance improvements in established benchmarks [14][24]. Group 2: Experimental Results - AutoVLA achieved competitive performance metrics in various scenarios, with the best-of-N method reaching a PDMS score of 92.12, indicating its effectiveness in planning and execution [5]. - ReCogDrive set a new state-of-the-art PDMS score of 89.6 on the NAVSIM benchmark, showcasing its robustness and safety in driving trajectories [9][10]. - The OpenDriveVLA model demonstrated superior results in open-loop trajectory planning and driving-related question-answering tasks, outperforming previous methods on the nuScenes dataset [28][32]. Group 3: Industry Trends - The article highlights a trend among major automotive manufacturers, such as Li Auto, Xiaomi, and XPeng, to invest heavily in VLA model research and development, indicating a competitive landscape in autonomous driving technology [2][3]. - The integration of large language models (LLMs) with VLA frameworks is becoming a focal point for enhancing decision-making capabilities in autonomous vehicles, as seen in models like ORION and VLM-RL [33][39].