3D重建

Search documents
ICCV 2025 | RobustSplat: 解耦致密化与动态的抗瞬态3DGS三维重建
机器之心· 2025-08-19 09:45
Core Viewpoint - The article discusses the RobustSplat method, which addresses the challenges of 3D Gaussian Splatting (3DGS) in rendering dynamic objects by introducing a delayed Gaussian growth strategy and a scale-cascade mask guidance method to reduce rendering artifacts caused by transient objects [2][21]. Research Motivation - The motivation stems from understanding the dual role of Gaussian densification in 3DGS, which enhances scene detail but also risks overfitting dynamic areas, leading to artifacts and scene distortion. The goal is to balance static structure representation and dynamic interference suppression [6][8]. Methodology - **Transient Mask Estimation**: Utilizes a Mask MLP with two linear layers to output pixel-wise transient masks, distinguishing between transient and static regions [9]. - **Feature Selection**: DINOv2 features are chosen for their balance of semantic consistency, noise resistance, and computational efficiency, outperforming other feature sets like Stable Diffusion and SAM [10]. - **Supervision Design**: Combines image residual loss and feature cosine similarity loss for mask MLP optimization, enhancing dynamic area recognition [12]. - **Delayed Gaussian Growth Strategy**: This core strategy postpones the densification process to prioritize static scene structure optimization, reducing the risk of misclassifying static areas as transient [13]. - **Scale-Cascade Mask Guidance**: Initially estimates transient masks using low-resolution features, then transitions to high-resolution supervision for more accurate mask predictions [14]. Experimental Results - Experiments on NeRF On-the-go and RobustNeRF datasets show that RobustSplat outperforms baseline methods like 3DGS, SpotLessSplats, and WildGaussians across various metrics, including PSNR, SSIM, and LPIPS [16][21]. Summary - RobustSplat effectively reduces rendering artifacts caused by transient objects through its innovative strategies, demonstrating superior performance in complex scenes with dynamic elements while preserving detail [19][21].
随手拍照片就能VR云旅游!无位姿、稀疏图像条件下实现稳定3D重建和新视角合成|港科广
量子位· 2025-07-31 04:23
Core Viewpoint - A new algorithm, RegGS, developed by the Hong Kong University of Science and Technology (Guangzhou), can reconstruct 3D models from sparse 2D images without precise camera positioning, achieving centimeter-level accuracy suitable for VR applications [2][4]. Group 1: Methodology - RegGS combines feed-forward Gaussian representation with structural registration to address the challenges of sparse and pose-less images, providing a new pathway for practical 3D reconstruction [6][8]. - The core mechanism involves registering local 3D Gaussian mixture models to gradually build a global 3D scene, avoiding reliance on traditional Structure from Motion (SfM) initialization and requiring fewer input images [8][12]. Group 2: Experimental Results - In experiments on the RE10K and ACID datasets, RegGS outperformed existing mainstream methods across various input frame counts (2×/8×/16×/32×) in metrics such as PSNR, SSIM, and LPIPS [9][12]. Group 3: Applications - RegGS addresses the "sparse + pose-less" problem with significant real-world applications, including: - 3D reconstruction from user-generated content (UGC) videos, which often lack camera parameters [13]. - Drone aerial mapping, demonstrating robustness to large viewpoint variations and low frame rates [13]. - Restoration of historical images/documents, enabling 3D reconstruction from a few photos taken from different angles [13]. - Compared to traditional SfM or Bundle Adjustment methods, RegGS requires less structural input and is more feasible for unstructured data applications [13]. Group 4: Limitations and Future Directions - The performance and efficiency of RegGS are currently limited by the quality of the upstream feed-forward model and the computational cost of the MW2 distance calculation, indicating areas for future optimization [13].
李飞飞空间智能独角兽开源底层技术!AI生成3D世界在所有设备流畅运行空间智能的“着色器”来了
量子位· 2025-06-03 04:26
Core Viewpoint - World Labs, co-founded by Fei-Fei Li, has open-sourced a core technology called Forge, a real-time 3D Gaussian Splatting renderer that operates seamlessly across various devices, including desktops, low-power mobile devices, and XR [1][6]. Group 1: Technology Overview - Forge is a web-based 3D Gaussian Splatting renderer that integrates with three.js, enabling fully dynamic and programmable Gaussian splatting [2]. - The underlying design of Forge is optimized for GPU, serving a role similar to traditional 3D graphics components known as "shaders" [3]. - The technology allows developers to handle AI-generated 3D worlds as easily as manipulating triangle meshes, according to Ben Mildenhall, co-founder of World Labs [5]. Group 2: Features and Capabilities - Forge requires minimal code to start and run, supporting multiple splat objects, cameras, and real-time animations/edits [4]. - It is designed as a programmable 3D Gaussian Splatting engine, providing unprecedented control over the generation, animation, and rendering of 3D Gaussian splats [8]. - The renderer employs a painter's algorithm for sorting splats, which is a core aspect of its design [13]. Group 3: Rendering Process - The key component managing the rendering process is ForgeRenderer, which compiles a complete list of splats in a three.js scene and determines the drawing order using an efficient bucket sort algorithm [14]. - Forge supports multi-view rendering by generating additional ForgeViewpoint objects, allowing for simultaneous rendering from different perspectives [15]. Group 4: Future Plans - World Labs aims to elevate multimodal AI from 2D pixel planes to full 3D worlds, with plans to launch its first product in 2025 [17]. - The company intends to develop tools beneficial for professionals such as artists, designers, developers, filmmakers, and engineers, targeting a wide range of customers from video game developers to film studios [17].
美图公司AI视觉领域竞争力升级:七项图像编辑成果出炉
Zheng Quan Ri Bao· 2025-04-09 08:40
Core Insights - Meitu's MT Lab has achieved significant recognition with five research outcomes selected for the prestigious CVPR 2025 conference, which received over 13,000 submissions and has a low acceptance rate of 22.1% [2] - The lab also had two projects accepted at the AAAI 2025 conference, which had an acceptance rate of 23.4% from 12,957 submissions [2] - The seven research outcomes focus on image editing, including three generative AI technologies, three segmentation technologies, and one 3D reconstruction technology [2] Generative AI Technologies - GlyphMastero has been implemented in Meitu's app Meitu Xiuxiu, providing users with a seamless text modification experience [3] - MTADiffusion is integrated into Meitu's AI material generator WHEE, allowing for efficient image editing with simple commands [3] - StyO is utilized in Meitu Xiuxiu's AI creative and beauty camera features, enabling users to explore different dimensions easily [4] Segmentation and 3D Reconstruction Technologies - The segmentation breakthroughs include interactive segmentation and cutout technologies, which are applied in e-commerce design, image editing, and portrait beautification [4] - EVPGS represents advancements in 3D reconstruction, with increasing demand in new perspective generation, augmented reality (AR), 3D content generation, and virtual digital humans [4] Industry Position and Future Potential - Meitu's long-term investment in AI capabilities has allowed the company to integrate cutting-edge technologies into practical applications, enhancing its competitive edge in the core visual field [4] - The continuous iteration of product capabilities has led to increased user engagement and willingness to pay, indicating promising growth potential and expansion opportunities for the company [4]