Workflow
深度估计
icon
Search documents
浙大&理想用全新连续性思路得到显著更好的深度估计效果
理想TOP2· 2026-01-09 12:34
2026年1月6日浙大 & 理想 发布InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields 这篇论文实质是在讲 用全新连续性思路,更少的计算成本,得到显著更好的深度估计效果,尤其是预测细粒度几何细节方面。 深度估计可以理解成看到一张实拍图,去推算这张图里一切实体表面的三维结构。深度估计得越准,车就能越好地感知周围环境,也能更好的重建和生 成世界模型 。 InfiniDepth 本质是提供了极高精度的几何结构,在单目RGB下提供相对深度;在配合激光雷达或稀疏深度输入时,能生成超高分辨率的精准绝对深度。 目前主流深度估计方法,通常把深度图看作一个普通的二维图像(网格)。放得越大,看着就越模糊,越像马赛克,细节全丢了,且输出分辨率通常被 限制在训练时的尺寸 。 InfiniDepth基于神经隐式场(Neural Implicit Fields)。不再把深度看作一张由死板像素组成的画,而是看作一个连续的数学函数。可以询问这个函数图像上 任意位置(哪怕是两个像素之间)的深度值,都能给 ...
AI Day直播 | “像素级完美”深度感知,NeurIPS高分论文解密
自动驾驶之心· 2025-11-05 00:04
点击按钮预约直播 深度估计是机器人感知、三维重建、AR/VR 等应用的核心。然而,现有的深度估计方法普遍存在边缘飞点(Flying Pixels)问题,而这会导致机器人执行决策时候,引发错误动作;三维重建时导致物体轮廓鬼影重重等。现有方法经历边 缘飞点主要因为以下原因: 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>直播和内容获取转到 → 自动驾驶之心知识星球 本文提出 Pixel-Perfect Depth (PPD),一种 直接在像素空间进行扩散生成的单目深度估计模型 ,从根源上避免了因 VAE 压缩导致的伪影问题。然而,高分辨率像素空间的扩散建模极具挑战:模型需兼顾 全局语义的一致性 与 局部细节的精确 性 ,否则极易出现结构失真或深度跳变。为此,本文设计了语义引导的扩散 Transformer(SP-DiT),在扩散过程中引入 来自视觉基础模型的高层语义特征作为提示,有效增强了模型对全局结构的把握与细节恢复能力。同时,本文提出一种 判别式模型 (如 Depth Anything v2, Depth Pro )由于回归损失的平滑倾向,容易在深度 ...
NeurIPS'25高分论文!华科、浙大&小米提出深度估计新范式
自动驾驶之心· 2025-10-15 23:33
Research Motivation and Contribution - The core issue in existing depth estimation methods is the "Flying Pixels" problem, which leads to erroneous actions in robotic decision-making and ghosting in 3D reconstruction [2] - The proposed method, Pixel-Perfect Depth (PPD), aims to eliminate artifacts caused by VAE compression by performing diffusion directly in pixel space [6] Innovation and Methodology - PPD introduces a novel diffusion model that operates in pixel space, addressing challenges of maintaining global semantic consistency and local detail accuracy [6][9] - The model incorporates a Semantics-Prompted Diffusion Transformer (SP-DiT) that enhances the modeling capabilities by integrating high-level semantic features during the diffusion process [9][16] Results and Performance - PPD outperforms existing generative depth estimation models across five public benchmarks, showing significant improvements in edge point cloud evaluation and producing depth maps with minimal "Flying Pixels" [14][20] - The model demonstrates exceptional zero-shot generalization capabilities, achieving superior performance without relying on pre-trained image priors [20][22] Experimental Analysis - A comprehensive ablation study indicates that the proposed SP-DiT significantly enhances performance metrics, with an 78% improvement in the AbsRel metric on the NYUv2 dataset compared to baseline models [25][26] - The introduction of a Cascaded DiT design improves computational efficiency by reducing inference time by 30% while maintaining high accuracy [26][27] Edge Point Cloud Evaluation - The model aims to generate pixel-perfect depth maps, addressing the challenge of evaluating edge accuracy through a newly proposed Edge-Aware Point Cloud Metric [28][30] - Experimental results confirm that PPD effectively avoids the "Flying Pixels" issue, demonstrating superior performance in edge accuracy compared to existing methods [28][34] Conclusion - PPD represents a significant advancement in depth estimation, providing high-quality outputs with sharp structures and clear edges, while minimizing artifacts [34][35] - The research opens new avenues for high-fidelity depth estimation based on diffusion models, emphasizing the importance of maintaining both global semantics and local geometric consistency [35]