Workflow
驾驶世界模型
icon
Search documents
理想提出首个包含自车和他车轨迹的世界模型
理想TOP2· 2025-11-23 11:56
理想的世界模型包含自车和其他车的轨迹,这是理想首次提出的。 做这件事目的是为了能够让理想VLA在仿真环境里进行强化学习,同一个场景可以不断测试更优的轨迹路线,这是真实数据完全无法实现的。 可视化见下面这个视频: 理想VLA训练过程: 预训练阶段是在云端训一个32B的VL基座模型,包含3D视觉、比开源模型清晰度提升3-5倍的高清2D视觉、驾驶相关的language的语料,关键的 VL联合语料(如导航信息与人类判断的同步记录),为适配车端算力并保证推理速度,云端大模型蒸馏成3.2B的MoE模型。 后训练阶段是将action引入模型,使其转化为VLA,参数量接近4B,采用短链条CoT,限制在2-3步以内,再用difusion,对未来4-8秒的轨迹和 环境进行预测。 强化学习阶段为两部分,一是人类反馈强化学习,二是不依赖人类反馈,利用世界模型模型生成数据进行纯强化学习训练,基于舒适性(G值)、 无碰撞、遵守交规三大指标自我进化,目标是驾驶水平超越人类。 2025年3月12日理想发布 Other Vehicle Trajectories Are Also Needed: A Driving World Model Un ...
ICCV‘25 | 华科提出HERMES:首个统一驾驶世界模型!
自动驾驶之心· 2025-07-25 10:47
Core Viewpoint - The article introduces HERMES, a unified driving world model that integrates 3D scene understanding and future scene generation, significantly reducing generation errors by 32.4% compared to existing methods [4][17]. Group 1: Model Overview - HERMES addresses the fragmentation in existing driving world models by combining scene generation and understanding capabilities [3]. - The model utilizes a BEV (Bird's Eye View) representation to integrate multi-view spatial information and introduces a "world query" mechanism to enhance scene generation with world knowledge [3][4]. Group 2: Challenges and Solutions - The model overcomes the challenge of multi-view spatiality by employing a BEV-based world tokenizer, which compresses multi-view images into BEV features, thus preserving key spatial information while adhering to token length limitations [5]. - To address the integration of understanding and generation, HERMES introduces world queries that enhance the generated scenes with world knowledge, bridging the gap between understanding and generation [8]. Group 3: Performance Metrics - HERMES demonstrates superior performance on the nuScenes and OmniDrive-nuScenes datasets, achieving an 8.0% improvement in the CIDEr metric for understanding tasks and significantly lower Chamfer distances in generation tasks [4][17]. - The model's world query mechanism contributes to a 10% reduction in Chamfer distance for 3-second point cloud predictions, showcasing its effectiveness in enhancing generation performance [20]. Group 4: Experimental Validation - The experiments utilized datasets such as nuScenes, NuInteract, and OmniDrive-nuScenes, employing metrics like METEOR, CIDEr, ROUGE for understanding tasks, and Chamfer distance for generation tasks [19]. - Ablation studies confirm the importance of the interaction between understanding and generation, with the unified framework outperforming separate training methods [18]. Group 5: Qualitative Results - HERMES is capable of accurately generating future point cloud evolutions and understanding complex scenes, although challenges remain in scenarios involving complex turns, occlusions, and nighttime conditions [24].