WorldSplat
Search documents
AI Day直播 | WorldSplat:用于自动驾驶的高斯中心前馈4D场景生成
自动驾驶之心· 2025-11-19 00:03
Core Viewpoint - The article discusses the advancements in driving scene generation and reconstruction technologies, highlighting the introduction of WorldSplat, a novel feed-forward 4D driving scene generation framework that effectively generates consistent multi-trajectory videos [3][8]. Summary by Sections Driving Scene Generation and Reconstruction - Recent progress in driving scene generation and reconstruction technologies shows significant potential in enhancing autonomous driving system performance by generating scalable and controllable training data [3]. - Existing generation methods primarily focus on synthesizing diverse and high-fidelity driving videos but struggle with 3D consistency and sparse viewpoint coverage, limiting their ability to support high-quality new viewpoint synthesis (NVS) [3]. Introduction of WorldSplat - WorldSplat is introduced as a solution to the challenges between scene generation and reconstruction, developed by research teams from Nankai University [3]. - The framework employs two key steps: (1) it integrates a multi-modal information fusion 4D perception latent diffusion model to generate pixel-aligned 4D Gaussian distributions in a feed-forward manner; (2) it utilizes an enhanced video diffusion model to optimize new viewpoint videos rendered from these Gaussian distributions [3]. Experimental Results - Extensive experiments conducted on benchmark datasets demonstrate that WorldSplat can effectively generate high-fidelity, spatiotemporally consistent multi-trajectory new viewpoint driving videos [3][8].
最新世界模型!WorldSplat:用于自动驾驶的高斯中心前馈4D场景生成(小米&南开)
自动驾驶之心· 2025-10-02 03:04
Core Insights - The article introduces WorldSplat, a novel feedforward framework that integrates generative methods with explicit 3D reconstruction for 4D driving scene synthesis, addressing the challenges of generating controllable and realistic driving scene videos [5][36]. Background Review - Generating controllable and realistic driving scene videos is a core challenge in autonomous driving and computer vision, crucial for scalable training and closed-loop evaluation [5]. - Existing generative models have made progress in high-fidelity, user-customized video generation, reducing reliance on expensive real data, while urban scene reconstruction methods have optimized 3D representation and consistency for new view synthesis [5][6]. - Despite advancements, generative and reconstruction methods face challenges in creating unknown environments and synthesizing new views, with existing video generation models often lacking 3D consistency and controllability [5][6]. WorldSplat Framework - WorldSplat combines generative diffusion with explicit 3D reconstruction, constructing dynamic 4D Gaussian representations that can render new views along any user-defined camera trajectory without scene-by-scene optimization [6][10]. - The framework consists of three key modules: a 4D perception latent diffusion model for multimodal latent variable generation, a latent Gaussian decoder for feedforward 4D Gaussian prediction and real-time trajectory rendering, and an enhanced diffusion model for video quality optimization [10][12]. Algorithm Details - The 4D perception latent diffusion model generates multimodal latent variables containing RGB, depth, and dynamic target information based on user-defined control conditions [14][15]. - The latent Gaussian decoder predicts pixel-aligned 3D Gaussian distributions, separating static backgrounds from dynamic targets to create a unified 4D representation [20][21]. - The enhanced diffusion model optimizes the rendered RGB video based on both the original input and the rendered video, enriching spatial details and enhancing temporal coherence [24][27]. Experimental Results - Extensive experiments demonstrate that WorldSplat achieves state-of-the-art performance in generating high-fidelity, temporally consistent free-view videos, significantly benefiting downstream driving tasks [12][36]. - Comparative results show that WorldSplat outperforms existing generative and reconstruction techniques in terms of realism and new view quality [31][32]. Conclusion - The proposed WorldSplat framework effectively integrates generative and reconstruction methods, enabling the generation of explicit 4D Gaussian distributions optimized for high-fidelity, temporally and spatially consistent multi-trajectory driving videos [36].