Workflow
4D Reconstruction
icon
Search documents
AI Day直播!免位姿前馈4D自动驾驶世界DGGT
自动驾驶之心· 2025-12-23 00:53
论文标题 : DGGT:Feedforward 4D Reconstruction of Dynamic Driving Scenes using Unposed Images 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>直播和内容获取转到 → 自动驾驶之心知识星球 点击按钮预约直播 自动驾驶的训练与评估需要快速、可扩展的4D重建与重新仿真能力,然而现有大多数针对动态驾驶场景的方法仍依赖于 逐场景优化、已知相机标定或短时间窗口,导致速度缓慢、实用性受限。 本文从前馈视角重新审视该问题,提出了 Driving Gaussian Grounded Transformer(DGGT) ,一个统一的、无需位姿 的动态场景重建框架。本文注意到,现有方法通常将相机位姿作为必需输入,限制了灵活性与可扩展性。相反,本文将 位姿重新定义为模型的输出,从而能够直接从稀疏、无位姿的图像进行重建,并支持长序列中任意数量的视角。该方法 联合预测每帧的3D高斯图与相机参数,通过轻量级动态头解耦动态元素,并利用寿命头调制随时间变化的可见性以保持 时序一致性。 此外,基于扩散的渲 ...
复旦最新一篇DriveVGGT:面向自动驾驶,高效实现多相机4D重建
自动驾驶之心· 2025-12-17 00:03
点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 论文作者 | Xiaosong Jia等 编辑 | 自动驾驶之心 自动驾驶中的4D场景重建是实现环境感知与运动规划的关键环节,然而传统视觉几何模型在多相机、低重叠的自动驾驶场景中往往表现不佳。 来自上海交大、复旦等机构的研究者提出 DriveVGGT,一种专为自动驾驶设计的视觉几何Transformer,通过显式引入相机相对位姿先验,显著提升了多相机系统的几 何预测一致性与推理效率。 更多自动驾驶的行业信息、技术进展,欢迎加入自动驾驶之心知识星球获取! 背景介绍 4D重建是一项从视觉传感器预测几何信息的计算机视觉任务。与其他传感器相比,基于相机的重建因其低成本而在各个领域,尤其是在自动驾驶和机器人学中,得到 了广泛的研究和应用。通常,重建方法有两种类型。第一种是基于迭代的方法,例如。这些方法需要选择特定的场景或物体,并通过迭代重建来获得优化结果。然 而,由于泛化能力不足,当场景或物体发生变化或修改时,基于迭代的方法需要重新训练模型。第二种是前向方法。这些方法 ...
谷歌&伯克利新突破:单视频重建4D动态场景,轨迹追踪精度提升73%!
自动驾驶之心· 2025-07-05 13:41
Core Viewpoint - The research introduces a novel method called "Shape of Motion" that combines 3D Gaussian point technology with SE(3) motion representation, achieving a 73% improvement in 3D tracking accuracy compared to existing methods, with significant applications in AR/VR and autonomous driving [2][4]. Summary by Sections Introduction - The challenge of dynamic scene reconstruction from monocular video is likened to feeling an elephant in the dark due to the lack of information [7]. - Traditional methods rely on multi-view videos or depth sensors, making them less effective for dynamic scenes [7]. Core Contribution - The "Shape of Motion" technique enables the reconstruction of complete 4D scenes (3D space + time) from a single video, allowing for the tracking of object motion and rendering from any viewpoint [9][10]. - Two main innovations include low-dimensional motion representation using SE(3) motion bases and the integration of data-driven priors for a globally consistent dynamic scene representation [9][12]. Technical Analysis - The method employs 3D Gaussian points as the basic unit for scene representation, allowing for real-time rendering [10]. - Various data-driven priors, such as monocular depth estimation and long-range 2D trajectories, are utilized to overcome the under-constrained nature of monocular video reconstruction [11][12]. Experimental Results - The method outperforms existing techniques on the iPhone dataset, achieving a 73.3% accuracy in 3D tracking and a PSNR of 16.72 for new view synthesis [17][18]. - The 3D tracking error (EPE) is reported as low as 0.16 on the Kubric synthetic dataset, showing a 21% improvement over baseline methods [20]. Discussion and Future Outlook - The current method faces challenges such as training time and reliance on accurate camera pose estimation [25]. - Future directions include optimizing training time, enhancing view generation capabilities, and developing fully automated segmentation methods [25]. Conclusion - The "Shape of Motion" research marks a significant advancement in monocular dynamic reconstruction, with potential applications in real-time tracking for AR glasses and autonomous systems [26].