Workflow
级联式扩散结构
icon
Search documents
NeurIPS'25高分论文!华科、浙大&小米提出深度估计新范式
自动驾驶之心· 2025-10-15 23:33
Research Motivation and Contribution - The core issue in existing depth estimation methods is the "Flying Pixels" problem, which leads to erroneous actions in robotic decision-making and ghosting in 3D reconstruction [2] - The proposed method, Pixel-Perfect Depth (PPD), aims to eliminate artifacts caused by VAE compression by performing diffusion directly in pixel space [6] Innovation and Methodology - PPD introduces a novel diffusion model that operates in pixel space, addressing challenges of maintaining global semantic consistency and local detail accuracy [6][9] - The model incorporates a Semantics-Prompted Diffusion Transformer (SP-DiT) that enhances the modeling capabilities by integrating high-level semantic features during the diffusion process [9][16] Results and Performance - PPD outperforms existing generative depth estimation models across five public benchmarks, showing significant improvements in edge point cloud evaluation and producing depth maps with minimal "Flying Pixels" [14][20] - The model demonstrates exceptional zero-shot generalization capabilities, achieving superior performance without relying on pre-trained image priors [20][22] Experimental Analysis - A comprehensive ablation study indicates that the proposed SP-DiT significantly enhances performance metrics, with an 78% improvement in the AbsRel metric on the NYUv2 dataset compared to baseline models [25][26] - The introduction of a Cascaded DiT design improves computational efficiency by reducing inference time by 30% while maintaining high accuracy [26][27] Edge Point Cloud Evaluation - The model aims to generate pixel-perfect depth maps, addressing the challenge of evaluating edge accuracy through a newly proposed Edge-Aware Point Cloud Metric [28][30] - Experimental results confirm that PPD effectively avoids the "Flying Pixels" issue, demonstrating superior performance in edge accuracy compared to existing methods [28][34] Conclusion - PPD represents a significant advancement in depth estimation, providing high-quality outputs with sharp structures and clear edges, while minimizing artifacts [34][35] - The research opens new avenues for high-fidelity depth estimation based on diffusion models, emphasizing the importance of maintaining both global semantics and local geometric consistency [35]