Workflow
全景深度估计
icon
Search documents
混元3D开源端到端全景深度估计器,代码+精选全景数据已上线,在线可玩
量子位· 2025-10-14 04:08
Core Insights - The article discusses the development of DA, a novel end-to-end panoramic depth estimator by Tencent's Mixed Reality 3D team, which addresses the challenges of panoramic data scarcity and zero-shot generalization capabilities [2][8]. Group 1: Background and Challenges - Panoramic images provide a 360°×180° immersive view, essential for advanced applications like AR/VR and 3D scene reconstruction [5][6]. - Traditional methods for depth estimation in panoramic images are limited due to the scarcity of panoramic depth data and the inherent spherical distortion of panoramic images [10][12]. - The team aims to expand panoramic data and build a robust data foundation for DA [8]. Group 2: Data Augmentation Engine - The team developed a data management engine to convert high-quality perspective depth data into panoramic data, significantly increasing the quantity and diversity of panoramic samples [11][14]. - Approximately 543K panoramic samples were created, expanding the total sample size from about 63K to approximately 607K, addressing the issue of data scarcity [14]. Group 3: Model Architecture and Training - The SphereViT architecture was introduced to mitigate the effects of spherical distortion, allowing the model to focus on the spherical geometry of panoramic images [16][17]. - The training process incorporates distance loss for global accuracy and normal loss for local surface smoothness, enhancing the model's performance [18]. Group 4: Experimental Results - DA demonstrated state-of-the-art (SOTA) performance, with an average improvement of 38% in AbsRel performance compared to the strongest zero-shot methods [23][24]. - Qualitative comparisons showed that DA's training utilized approximately 21 times more panoramic data than UniK3D, resulting in more accurate geometric predictions [27]. Group 5: Application Scenarios - DA's exceptional zero-shot generalization capabilities enable a wide range of 3D reconstruction applications, such as panoramic multi-view reconstruction [28]. - The model can reconstruct globally aligned 3D point clouds from panoramic images of different rooms in a house or apartment, ensuring spatial consistency across multiple panoramic views [29].