分钟级长视频生成
Search documents
分钟级长视频生成!地平线Epona:自回归扩散式的端到端自动驾驶世界模型(ICCV'25)
自动驾驶之心· 2025-07-07 12:17
Core Insights - The article discusses the development of Epona, a novel autoregressive diffusion world model for autonomous driving, which integrates the advantages of diffusion models and autoregressive models to support long video generation, trajectory control, and real-time motion planning within a single framework [2][33]. Group 1: Research Motivation - The research highlights the growing interest in world models as a key technology for simulating physical environments and assisting agents in planning and decision-making, particularly in high-dynamic and complex tasks like autonomous driving [6]. - Current world model architectures face significant limitations, particularly in their ability to provide high-quality long-term predictions and real-time motion planning [7]. Group 2: Innovations of Epona - Epona introduces two key innovations: decoupled spatiotemporal modeling, which separates temporal dynamics from fine-grained future world generation, and modular trajectory and video prediction, allowing seamless integration of motion planning and visual modeling [2][19]. - The model employs a new "chain-of-forward training strategy" to address error accumulation in autoregressive cycles while achieving high-resolution, long-duration generation [2][23]. Group 3: Performance Metrics - Epona demonstrates a 7.4% improvement in FVD metrics compared to existing methods, with the capability to predict durations of several minutes [2][26]. - In experiments, Epona can generate high-quality driving videos exceeding 2 minutes (600 frames) in length, significantly outperforming other state-of-the-art models [26]. Group 4: Comparison with Existing Models - Epona's design contrasts with existing models that either lack critical planning modules or are limited by low resolution and short-term generation capabilities [9][31]. - The article compares Epona's performance metrics with other models, showing significant advantages in both video length and quality [29][30]. Group 5: Future Implications - The advancements presented by Epona could pave the way for the next generation of end-to-end autonomous driving systems, reducing reliance on complex perception modules and expensive labeled data [6][33].