Epona
Search documents
分钟级长视频生成!地平线Epona:自回归扩散式的端到端自动驾驶世界模型(ICCV'25)
自动驾驶之心· 2025-07-07 12:17
Core Insights - The article discusses the development of Epona, a novel autoregressive diffusion world model for autonomous driving, which integrates the advantages of diffusion models and autoregressive models to support long video generation, trajectory control, and real-time motion planning within a single framework [2][33]. Group 1: Research Motivation - The research highlights the growing interest in world models as a key technology for simulating physical environments and assisting agents in planning and decision-making, particularly in high-dynamic and complex tasks like autonomous driving [6]. - Current world model architectures face significant limitations, particularly in their ability to provide high-quality long-term predictions and real-time motion planning [7]. Group 2: Innovations of Epona - Epona introduces two key innovations: decoupled spatiotemporal modeling, which separates temporal dynamics from fine-grained future world generation, and modular trajectory and video prediction, allowing seamless integration of motion planning and visual modeling [2][19]. - The model employs a new "chain-of-forward training strategy" to address error accumulation in autoregressive cycles while achieving high-resolution, long-duration generation [2][23]. Group 3: Performance Metrics - Epona demonstrates a 7.4% improvement in FVD metrics compared to existing methods, with the capability to predict durations of several minutes [2][26]. - In experiments, Epona can generate high-quality driving videos exceeding 2 minutes (600 frames) in length, significantly outperforming other state-of-the-art models [26]. Group 4: Comparison with Existing Models - Epona's design contrasts with existing models that either lack critical planning modules or are limited by low resolution and short-term generation capabilities [9][31]. - The article compares Epona's performance metrics with other models, showing significant advantages in both video length and quality [29][30]. Group 5: Future Implications - The advancements presented by Epona could pave the way for the next generation of end-to-end autonomous driving systems, reducing reliance on complex perception modules and expensive labeled data [6][33].
自动驾驶论文速递 | 世界模型、VLA综述、端到端等
自动驾驶之心· 2025-07-02 07:34
Core Insights - The article discusses advancements in autonomous driving technology, particularly focusing on the Epona model, which utilizes autoregressive diffusion for trajectory planning and long-term generation [6][5]. Group 1: Epona Model - Epona can generate sequences lasting up to 2 minutes, significantly outperforming existing world models [6]. - It features a real-time trajectory planning capability that operates independently of video prediction, achieving frame rates up to 20Hz [6]. - The model employs a continuous visual marker in its autoregressive formulation, preserving rich scene details [6]. Group 2: Experimental Results - The article presents various metrics comparing Epona with other models, highlighting its superior performance in FID and FVD metrics [5]. - Epona achieved a FID score of 7.5 and a FVD score of 82.8, indicating its effectiveness in generating high-quality driving scenarios [5]. Group 3: Vision-Language-Action Models - A survey on Vision-Language-Action models for autonomous driving is also discussed, showcasing various models and their capabilities [15][18]. - The models listed include DriveGPT-4, ADriver-I, and RAG-Driver, each with unique features and datasets [18]. Group 4: StyleDrive Benchmarking - The article introduces StyleDrive, which aims to benchmark end-to-end autonomous driving with a focus on driving style awareness [21]. - It outlines rule-based heuristic criteria for driving style classification across various traffic scenarios [22]. Group 5: Community Engagement - The article encourages joining a knowledge-sharing community focused on autonomous driving, offering resources and networking opportunities [9][25]. - The community aims to build a comprehensive platform for learning and sharing the latest industry trends and job opportunities [25].