Workflow
DiffusionGS
icon
Search documents
将3DGS嵌入Diffusion - 高速高分辨3D生成框架(ICCV'25)
自动驾驶之心· 2025-11-01 16:04
Core Viewpoint - The article introduces a novel pixel-level 3D diffusion model called DiffusionGS for the Image-to-3D generation task, which maintains 3D view consistency and can be applied to both object-centric and larger-scale scene-level generation [2][17]. Group 1: Methodology - DiffusionGS predicts a 3D Gaussian point cloud at each timestep to ensure consistency in generated views, enhancing the quality of both object and scene generation [2][30]. - The model operates in pixel space rather than latent space, allowing for better preservation of 3D representations and higher spatial resolution [26][30]. - A scene-object mixed training strategy is proposed to generalize 3D priors from various datasets, improving the model's performance [32][34]. Group 2: Performance Metrics - DiffusionGS achieves a PSNR of 25.89 and an SSIM of 0.8880, outperforming current state-of-the-art methods by 2.20 dB in PSNR and 23.25 in FID scores [40]. - The model generates images in 6 seconds for 256x256 resolution and 24 seconds for 512x512 resolution, which is 7.5 times faster than Hunyuan-v2.5 [16][40]. - The method demonstrates superior clarity and 3D consistency in generated images, with fewer artifacts and blurriness compared to existing techniques [44]. Group 3: Technical Contributions - The introduction of the Reference-Point Plucker Coordinate (RPPC) enhances spatial perception by incorporating camera pose information into the model [32][37]. - The model's architecture includes two different MLPs for Gaussian primitives decoding, tailored for object-level and scene-level generation [39]. - A point distribution loss is designed to improve object-level training, ensuring better convergence and performance [39].