深度估计
Search documents
浙大&理想用全新连续性思路得到显著更好的深度估计效果
理想TOP2· 2026-01-09 12:34
Core Viewpoint - The article discusses the InfiniDepth method, which utilizes a novel continuous approach to achieve significantly improved depth estimation with lower computational costs, particularly in predicting fine geometric details. Group 1: Depth Estimation Overview - Depth estimation is the process of inferring the three-dimensional structure of objects in a real image, where higher accuracy allows for better environmental perception and world model reconstruction [1]. - InfiniDepth provides high-precision geometric structures, offering relative depth from monocular RGB images and generating ultra-high-resolution absolute depth when combined with LiDAR or sparse depth inputs [1]. Group 2: Methodological Inspiration - InfiniDepth draws inspiration from advancements in 3D reconstruction, specifically NeRF and PiFU, which demonstrate that scenes can be modeled as continuous functions rather than rigid 3D pixels, achieving high geometric detail with fewer parameters [2]. - The LIIF (Learning Local Implicit Fourier Representation) method introduces implicit functions to 2D images, treating them as continuous signals for arbitrary scale super-resolution, which InfiniDepth applies to depth map predictions [3]. Group 3: Key Innovations - InfiniDepth challenges traditional depth estimation methods that restrict output resolution to the input image size, proposing a neural implicit field model for depth that decouples resolution from input size [4]. - The method consists of three core steps: - Feature extraction using a visual encoder (DINOv3) to create a feature pyramid that captures both macro and micro information [5]. - Depth decoding through a lightweight decoder (MLP) that efficiently translates features into depth values [6]. - Infinite depth querying that intelligently generates additional query points in sparse areas to ensure uniform distribution of 3D point clouds [7]. Group 4: Performance Metrics - InfiniDepth demonstrates superior depth map quality at higher resolutions, achieving better point cloud results and improved effects in bird's-eye view (BEV) perspectives [10][11][14]. - A new testing dataset was created based on five AAA games to address the limitations of traditional low-resolution and sparse ground truth depth maps, which often fail to capture fine geometric structures [15]. Group 5: Statistical Performance - InfiniDepth achieved first place in 58 out of 60 statistical metrics, with two second-place finishes, showcasing its effectiveness compared to other methods [16].
AI Day直播 | “像素级完美”深度感知,NeurIPS高分论文解密
自动驾驶之心· 2025-11-05 00:04
点击按钮预约直播 深度估计是机器人感知、三维重建、AR/VR 等应用的核心。然而,现有的深度估计方法普遍存在边缘飞点(Flying Pixels)问题,而这会导致机器人执行决策时候,引发错误动作;三维重建时导致物体轮廓鬼影重重等。现有方法经历边 缘飞点主要因为以下原因: 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>直播和内容获取转到 → 自动驾驶之心知识星球 本文提出 Pixel-Perfect Depth (PPD),一种 直接在像素空间进行扩散生成的单目深度估计模型 ,从根源上避免了因 VAE 压缩导致的伪影问题。然而,高分辨率像素空间的扩散建模极具挑战:模型需兼顾 全局语义的一致性 与 局部细节的精确 性 ,否则极易出现结构失真或深度跳变。为此,本文设计了语义引导的扩散 Transformer(SP-DiT),在扩散过程中引入 来自视觉基础模型的高层语义特征作为提示,有效增强了模型对全局结构的把握与细节恢复能力。同时,本文提出一种 判别式模型 (如 Depth Anything v2, Depth Pro )由于回归损失的平滑倾向,容易在深度 ...
NeurIPS'25高分论文!华科、浙大&小米提出深度估计新范式
自动驾驶之心· 2025-10-15 23:33
Research Motivation and Contribution - The core issue in existing depth estimation methods is the "Flying Pixels" problem, which leads to erroneous actions in robotic decision-making and ghosting in 3D reconstruction [2] - The proposed method, Pixel-Perfect Depth (PPD), aims to eliminate artifacts caused by VAE compression by performing diffusion directly in pixel space [6] Innovation and Methodology - PPD introduces a novel diffusion model that operates in pixel space, addressing challenges of maintaining global semantic consistency and local detail accuracy [6][9] - The model incorporates a Semantics-Prompted Diffusion Transformer (SP-DiT) that enhances the modeling capabilities by integrating high-level semantic features during the diffusion process [9][16] Results and Performance - PPD outperforms existing generative depth estimation models across five public benchmarks, showing significant improvements in edge point cloud evaluation and producing depth maps with minimal "Flying Pixels" [14][20] - The model demonstrates exceptional zero-shot generalization capabilities, achieving superior performance without relying on pre-trained image priors [20][22] Experimental Analysis - A comprehensive ablation study indicates that the proposed SP-DiT significantly enhances performance metrics, with an 78% improvement in the AbsRel metric on the NYUv2 dataset compared to baseline models [25][26] - The introduction of a Cascaded DiT design improves computational efficiency by reducing inference time by 30% while maintaining high accuracy [26][27] Edge Point Cloud Evaluation - The model aims to generate pixel-perfect depth maps, addressing the challenge of evaluating edge accuracy through a newly proposed Edge-Aware Point Cloud Metric [28][30] - Experimental results confirm that PPD effectively avoids the "Flying Pixels" issue, demonstrating superior performance in edge accuracy compared to existing methods [28][34] Conclusion - PPD represents a significant advancement in depth estimation, providing high-quality outputs with sharp structures and clear edges, while minimizing artifacts [34][35] - The research opens new avenues for high-fidelity depth estimation based on diffusion models, emphasizing the importance of maintaining both global semantics and local geometric consistency [35]