前馈3D重建
Search documents
前馈3D高斯泼溅新方法,浙大团队提出“体素对齐”,直接在三维空间融合多视角2D信息
3 6 Ke· 2025-09-29 07:26
在三维重建不断走向工程化的今天,前馈式3D Gaussian Splatting(Feed-Forward 3DGS)正火速走向产业化。 然而,现有的前馈3DGS方法主要采用"像素对齐"(pixel-aligned)策略——即将每个2D像素单独映射到一个或多个3D高斯上。 这一做法看似直观,但仍面临两道不可忽视的"天花板":二维特征难以在三维中精确对齐、以及高斯基元数量被像素网格死死绑定,无 法按场景复杂度智能分配。 VolSplat大胆抛弃像素对齐的固有范式,提出"体素对齐"(voxel-aligned)的前馈框架:在三维空间中融合视图信息,从根本上破局,让 高质量的多视角渲染变得更鲁棒、更高效、更易工程化。 在公开数据集上的对比实验显示,VolSplat在RealEstate10K和ScanNet(室内)数据集上的视觉质量与几何一致性上均优于多种pixel- aligned baseline。这些数值既说明了视觉质量的提升,也反映了几何一致性的增强。 | Method | PSNR + | SSIM ↑ | LPIPS T | PGS | | --- | --- | --- | --- | --- | | ...
前馈3D高斯泼溅新方法,浙大团队提出“体素对齐”,直接在三维空间融合多视角2D信息
量子位· 2025-09-29 04:57
Core Viewpoint - The article discusses the rapid industrialization of Feed-Forward 3D Gaussian Splatting (3DGS) and introduces VolSplat, which abandons the traditional pixel-aligned strategy in favor of a voxel-aligned framework, enhancing robustness, efficiency, and engineering feasibility in multi-view rendering [1][2]. Summary by Sections Introduction to VolSplat - VolSplat addresses the limitations of existing pixel-aligned methods, which struggle with precise alignment of 2D features in 3D space and are constrained by the pixel grid in Gaussian density allocation [2][6]. Performance Comparison - Experimental results on public datasets like RealEstate10K and ScanNet show that VolSplat outperforms various pixel-aligned baselines in visual quality and geometric consistency [4][5]. Core Concepts of VolSplat - The core idea of VolSplat is to shift alignment from 2D to 3D, allowing for better integration of multi-view information and overcoming challenges related to multi-view consistency and Gaussian density allocation [6][9]. Methodology Breakdown - The VolSplat pipeline consists of three clear modules: 1. 2D feature extraction and depth estimation 2. Lifting pixels to voxels and feature aggregation 3. Sparse 3D refinement and Gaussian regression [9][11]. Step-by-Step Process - **Step 1**: 2D features are extracted using a shared encoder, and depth maps are constructed to provide necessary geometric priors for subsequent processing [11]. - **Step 2**: Pixels are projected into 3D space based on predicted depths, creating a point cloud that is voxelized for feature aggregation, enhancing cross-view consistency [12][13]. - **Step 3**: A sparse 3D U-Net refines voxel features, predicting corrections for each voxel and regressing Gaussian parameters for rendering [14]. Experimental Highlights - VolSplat demonstrates superior zero-shot generalization across datasets, maintaining high performance even on unseen data, with a PSNR of 32.65 dB on the ACID dataset [15][17]. Practical Implications - The advancements in VolSplat lead to fewer artifacts and better geometric fidelity, translating to improved user experiences in applications like virtual tours and indoor navigation [17][19]. Future Directions - VolSplat opens new avenues for research in 3D reconstruction, robotics, autonomous driving, and AR/VR, providing a unified framework for integrating multimodal data [19][20].