DriveLaW
Search documents
帝国理工VLA综述:从世界模型到VLA,如何重构自动驾驶(T-ITS)
自动驾驶之心· 2026-01-05 00:35
Core Insights - The article discusses the transition of autonomous driving technology from "perception-planning" to an end-to-end Vision-Language-Action (VLA) paradigm, highlighting the significance of world models and generative simulation in this evolution [2][3]. Group 1: Technological Evolution - The review article from Imperial College London systematically analyzes 77 cutting-edge papers up to September 2025, focusing on three main dimensions: end-to-end VLA, world models, and modular integration, providing a comprehensive learning roadmap for developers [2]. - The emergence of VLA signifies a shift from simple multi-modal fusion to a collaborative reasoning flow between vision and language, directly outputting planning trajectories [10]. - The article emphasizes the importance of world models in leveraging generative AI to address corner cases in autonomous driving [6]. Group 2: Modular Integration - Despite the popularity of end-to-end architectures, modular solutions are experiencing a resurgence, demonstrating the potential of large models in traditional perception stacks, such as semantic anomaly detection and long-tail object recognition [7]. - The review highlights models like Talk2BEV and ChatBEV that utilize Vision-Language Models (VLM) for enhanced perception capabilities [7]. Group 3: Challenges and Solutions - The article identifies three major challenges facing VLM deployment in autonomous vehicles: reasoning latency, hallucinations, and computational trade-offs [9][13]. - Solutions discussed include visual token compression, chain-of-thought pruning, and optimization strategies for NVIDIA OrinX chips to address latency issues [12]. - To mitigate hallucination problems, techniques like "hallucination subspace projection" and rule-based safety filters are proposed [15]. Group 4: Future Directions - The review outlines four unresolved challenges in the field: standardized evaluation, edge deployment, multi-modal alignment, and legal and ethical considerations [17]. - It emphasizes the need for a unified scoring system for VLA safety and hallucination rates, as well as the importance of ensuring semantic consistency across different modalities in complex scenarios [17]. Group 5: Resource Compilation - The paper includes nine detailed classification tables and a review of key datasets and simulation platforms, such as NuScenes-QA and CARLA, to support community research and highlight the transition from open-loop metrics to closed-loop evaluations [14][16].
超越DriveVLA-W0!DriveLaW:世界模型表征一统生成与规划(华科&小米)
自动驾驶之心· 2026-01-04 01:04
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | Tianze Xia等 编辑 | 自动驾驶之心 近年来,得益于感知(如 BEVFormer, MapTR, BEVDet 等)和规划(如 UniAD, VAD, DiffusionDrive, ReCogDrive 等)的突破性进展,自动驾驶技术取得了长足进步。 然而,现有系统在面对 长尾场景 时依然显得脆弱,严重制约了闭环驾驶的性能。为了解决这一难题,近期大量研究工作尝试运用 世界模型(World Models) ,旨在 通过预测驾驶场景的未来演变来增强系统的泛化性与鲁棒性来解决长尾问题。 目前,世界模型在自动驾驶中的应用已百花齐放:一类致力于合成下游任务数据以应对罕见场景(如 VISTA, GAIA, MagicDrive, DriveDreamer, DrivingDiffusion);另 一类利用模拟环境进行策略学习(如 RAD, ReSim, OmniNWM);还有一类则提供未来的视觉预测作为辅助监督信号(如 DriveVLA, Dr ...