双SOTA!GenieDrive:物理一致的自动驾驶世界模型(港大&华为诺亚)
自动驾驶之心·2025-12-24 00:58

Core Insights - The article presents GenieDrive, a new framework for autonomous driving that utilizes 4D Occupancy as an intermediate representation, offering a novel research path of "first generating 4D occupancy, then generating video" [2][25]. Summary by Sections Project Overview - GenieDrive is a novel framework for autonomous driving world modeling that achieves highly controllable, multi-view consistent, and physically accurate video generation [7]. - It operates with only 3.47 million parameters while achieving a reasoning speed of 41 FPS and a 7.2% improvement in mIoU for 4D occupancy prediction tasks [5][7]. Research Background and Challenges - Current autonomous driving world models face two main challenges: insufficient physical consistency and high-dimensional representation modeling [8]. - Existing methods rely on single video diffusion models, which complicate learning and can lead to results inconsistent with real physical laws [4][8]. Innovations of GenieDrive - GenieDrive features a two-stage world modeling and generation framework, incorporating 4D Occupancy as an intermediate state to inject explicit physical information into video generation [10][11]. - It employs a Tri-plane VAE for efficient compression, using only 58% of the latent representations of existing methods while achieving state-of-the-art occupancy reconstruction performance [11]. - The framework includes a Mutual Control Attention mechanism to explicitly model the impact of driving control on occupancy evolution, enhancing prediction accuracy through end-to-end joint training [11]. Experimental Results and Analysis - GenieDrive shows significant improvements in 4D occupancy prediction performance, with a 7.2% increase in mIoU compared to the latest methods [13]. - The model achieves a reasoning speed of 41 FPS with a total parameter count of 3.47 million [13]. - In video generation, GenieDrive reduces the FVD metric by 20.7%, outperforming existing occupancy-based methods [15]. Future Outlook - By introducing 4D Occupancy as an intermediate representation, GenieDrive aims to advance closed-loop evaluation and simulation technologies, potentially opening new research directions and applications in the autonomous driving field [23].