UniSplat
Search documents
滴滴和港中文最新的前馈3D重建算法UniSplat!史少帅参与~
自动驾驶之心· 2025-11-08 16:03
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | Chen Shi等 编辑 | 自动驾驶之心 难得,滴滴也出了前馈GS方向的新工作,还是少帅参与 — UniSplat! 前馈式3D重建技术在自动驾驶领域发展很迅速,但现有工作在自动驾驶环视场景中的表现不佳,这是由于稀疏非重叠的相机视角以及复杂场景动态性双重buff导致。 针对这个问题,港中文(深圳)、滴滴和港大的团队提出UniSplat — 一种通用feed-forward框架,通过统一的潜在时空融合实现鲁棒的动态场景重建。 该框架构建3D 潜在Scaffold(一种结构化表示),利用预训练基础模型捕捉场景的几何和语义上下文。 实验表明,UniSplat的新视角合成能不还不错,即使对于原始相机覆盖范围外的视角,也能提供鲁棒且高质量的渲染结果。 PS. 立个Flag,最近打算梳理下前馈GS方向的内容,梳理下这个方向的里程碑及自驾领域结合的工作。 自动驾驶之心联合 工业界算法专家 开展了这门 《3DGS理论与算法 实战教程》! 我们花了两个月的时间 ...
滴滴和港中文最新的前馈3D重建算法UniSplat!史少帅参与~
自动驾驶之心· 2025-11-08 12:35
Core Viewpoint - The article discusses the introduction of UniSplat, a novel feed-forward framework for dynamic scene reconstruction in autonomous driving, which effectively integrates spatio-temporal information from multi-camera video inputs to enhance the robustness and quality of 3D scene reconstruction [6][44]. Background Review - Reconstructing 3D scenes from urban driving scenarios has become a core capability for autonomous driving systems, supporting critical tasks such as simulation, scene understanding, and long-horizon planning [5]. - Recent advancements in 3D Gaussian splatting technology have shown impressive rendering efficiency and fidelity, but existing methods often assume significant overlap between input images and rely on scene-by-scene optimization, limiting their applicability in real-time driving scenarios [5][6]. UniSplat Overview - UniSplat is designed to address the challenges of robust reconstruction in dynamic driving scenes by constructing a unified 3D Scaffold that integrates multi-view spatial information and multi-frame temporal information [6][9]. - The framework operates in three stages: building a 3D Scaffold from multi-view images, performing spatio-temporal fusion, and decoding the fused Scaffold into Gaussian primitives [6][9]. Experimental Results - Evaluations on the Waymo Open dataset and NuScenes dataset demonstrate that UniSplat achieves state-of-the-art performance in both input view reconstruction and new view synthesis tasks, showcasing strong robustness and superior rendering quality even for views outside the original camera coverage [7][34]. - In the Waymo dataset, UniSplat outperforms existing methods such as MVSplat and DepthSplat across all metrics, achieving a PSNR of 25.37 and an SSIM of 0.765 [34]. - The model effectively distinguishes between dynamic and static elements in scenes, successfully mitigating ghosting artifacts during scene completion [40]. Methodology Details - The 3D Scaffold is constructed by inferring geometric structures using a geometric backbone model and supplementing it with semantic information from a visual backbone model [14][16]. - A dual-branch decoder is employed to generate dynamic-aware Gaussian primitives, enhancing detail retention and scene completeness [23][27]. - The framework incorporates a memory mechanism to accumulate static Gaussian representations over time, facilitating long-term scene completion [29][31]. Conclusion - UniSplat represents a significant advancement in the field of dynamic scene understanding and interactive 4D content creation, providing a robust foundation for future research in lifelong world modeling and autonomous driving applications [44].