Workflow
GeoDrive
icon
Search documents
拆解理想在世界模型方向的工作
自动驾驶之心· 2026-01-05 09:30
ICCV2025中稿的Hierarchy UGP,自动驾驶场景重建。 具有时空一致性的多风格自动驾驶场景生成算法StyledStreets。 整合多模态驾驶意图与潜在世界模型实现合理规划的World4Drive,中稿ICCV 2025。 自动驾驶的视频生成扩散世界模型GeoDrive。 统一生成视觉与lidar的自驾世界模型框架OmniGen,中稿ACMMM2025。 结合强化学习的自动驾驶视频生成世界模型RLGF,中稿NeurIPS 2025。 利用稀疏注意力实现4D OCC世界模型预测算法 SparseWorld-TC。 世界模型端到端闭环强化学习框架AD-R1。 最近在复盘各家如何使用世界模型的,今天和大家盘一下理想在这方面的工作。理想对世界模型的定义在 重建+生成 ,利用重建技术(3DGS)建模自动驾驶场 景,再利用生成方法实现闭环仿真或者场景生成。 这里面核心的技术是3DGS和生成。 整体上来看,世界模型是围绕视频为核心搭建的时空认知系统 ,这一点也和蔚来任少卿的观点一致。通过跨模态的互相预测和重建,促使系统学习时空和物理规 律,让机器能像人一样理解环境。 通过重建+生成,既可以做云端的数据生成,也 ...
理想新一代世界模型首次实现实时场景编辑与VLA协同规划
理想TOP2· 2025-06-11 02:59
Core Viewpoint - GeoDrive is a next-generation world model system for autonomous driving, developed collaboratively by Peking University, Berkeley AI Research (BAIR), and Li Auto, addressing the limitations of existing methods that rely on 2D modeling and lack 3D spatial perception, which can lead to unreasonable trajectories and distorted dynamic interactions [11][14]. Group 1: Key Innovations - **Geometric Condition-Driven Generation**: Utilizes 3D rendering to replace numerical control signals, effectively solving the action drift problem [6]. - **Dynamic Editing Mechanism**: Injects controllable motion into static point clouds, balancing efficiency and flexibility [7]. - **Minimized Training Cost**: Freezes the backbone model and employs lightweight adapters for efficient data training [8]. - **Pioneering Applications**: Achieves real-time scene editing and VLA (Vision-Language-Action) collaborative planning within the driving world model for the first time [9][10]. Group 2: Technical Details - **3D Geometry Integration**: The system constructs a 3D representation from single RGB images, ensuring spatial consistency and coherence in scene structure [12][18]. - **Dynamic Editing Module**: Enhances the realism of multi-vehicle interaction scenarios during training by allowing flexible adjustments of movable objects [12]. - **Video Diffusion Architecture**: Combines rendered conditional sequences with noise features to enhance 3D geometric fidelity while maintaining photorealistic quality [12][33]. Group 3: Performance Metrics - GeoDrive significantly improves controllability of driving world models, reducing trajectory tracking error by 42% compared to the Vista model, and shows superior performance across various video quality metrics [19][34]. - The model demonstrates effective generalization to new perspective synthesis tasks, outperforming existing models like StreetGaussian in video quality [19][38]. Group 4: Conclusion - GeoDrive sets a new benchmark in autonomous driving by enhancing action controllability and spatial accuracy through explicit trajectory control and direct visual condition input, while also supporting applications like non-ego vehicle perspective generation and scene editing [41].