3D Gaussian Splatting (3DGS)
Search documents
NeurIPS Spotlight|GHAP:把3DGS“剪枝”变成“重建更小的高斯世界”
机器之心· 2025-11-14 09:30
Core Viewpoint - The article presents a novel approach to 3D Gaussian Splatting (3DGS) compression by framing it as a Gaussian mixture model simplification, which effectively reduces redundancy while preserving geometric details [4][28]. Summary by Sections Introduction - 3DGS is a popular method for 3D scene modeling that uses numerous Gaussian spheres to create high-quality 3D representations. However, the redundancy in Gaussian spheres limits storage and rendering speed [4]. Methodology - The proposed Gaussian-Herding-across-Pens (GHAP) method treats the entire 3DGS as a Gaussian mixture model, aiming to reconstruct a smaller mixture model globally. This approach maintains the geometric structure while reducing the number of Gaussian spheres [8][9]. - GHAP employs a two-stage process: first simplifying geometric information (position/covariance), followed by refining appearance features (opacity/color). This decoupling enhances stability [9][19]. Experimental Results - The GHAP method was compared with various pruning-based and end-to-end compression methods. Results indicate that GHAP consistently outperforms other baseline methods while being close to the performance of full-sample end-to-end methods [20][24]. - At a 10% retention rate, GHAP maintains high visual fidelity across different models and scenes, demonstrating its effectiveness in preserving the original geometric structure [23][24]. Conclusion - The GHAP method offers a new perspective on 3DGS compression, focusing on Gaussian mixture model simplification to retain geometric detail. It is designed to be scalable for large 3DGS scenes and is compatible with most existing 3DGS frameworks [28].
打破显存墙:谢赛宁团队提出CLM,单卡RTX 4090「撬动」1亿高斯点
机器之心· 2025-11-11 08:40
Core Insights - 3D Gaussian Splatting (3DGS) is an emerging method for novel view synthesis that utilizes a set of images with poses to iteratively train a scene representation composed of numerous anisotropic 3D Gaussian bodies, capturing the appearance and geometry of the scene [2][4] - The CLM system proposed by the team allows 3DGS to render large scenes using a single consumer-grade GPU, such as the RTX 4090, by addressing GPU memory limitations [6][8] Group 1: 3DGS Overview - 3DGS has shown revolutionary application potential in fields such as 3D modeling, digital twins, visual effects (VFX), VR/AR, and robot vision reconstruction (SLAM) [5] - The quality of images rendered using 3DGS depends on the fidelity of the trained scene representation, with larger and more complex scenes requiring more Gaussian bodies, leading to increased memory usage [5] Group 2: CLM System Design - CLM is designed based on the insight that the computation of 3DGS is inherently sparse, allowing only a small subset of Gaussian bodies to be accessed during each training iteration [8][20] - The system employs a novel unloading strategy that minimizes performance overhead and scales to large scenes by dynamically loading only the necessary Gaussian bodies into GPU memory while offloading the rest to CPU memory [8][11] Group 3: Performance and Efficiency - The implementation of CLM can render a large scene requiring 102 million Gaussian bodies on a single RTX 4090 while achieving top-tier reconstruction quality [8] - Each view typically accesses only 0.39% of the Gaussian points, with a maximum of 1.06% for any single view, highlighting the sparse nature of the data [23] Group 4: Optimization Techniques - The team utilized several unique characteristics of 3DGS to significantly reduce communication overhead associated with unloading, including pre-computing the accessed Gaussian sets for each view and leveraging spatial locality to optimize data transfer between CPU and GPU [12][17] - The microbatch scheduling optimization allows for overlapping access patterns between consecutive batches, enhancing cache hit rates and reducing redundant data transfers [24][25] Group 5: Results and Impact - CLM enhances the training capacity of 3DGS models by up to 6.1 times compared to pure GPU training baselines, enabling the training of larger models that improve scene reconstruction accuracy while lowering communication and unloading overhead [27]
Feed-Forward 3D综述:三维视觉如何「一步到位」
机器之心· 2025-11-06 08:58
Core Insights - The article discusses advancements in the field of 3D vision, particularly focusing on the transition from traditional methods to Feed-Forward 3D approaches, which enhance efficiency and generalization capabilities [2][4]. Summary by Sections Overview of Feed-Forward 3D - The article highlights the evolution of 3D reconstruction techniques, from Structure-from-Motion (SfM) to Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), emphasizing the shift towards Feed-Forward 3D methods that eliminate the need for per-scene optimization [2][6]. Key Technological Branches - Five main architectural categories of Feed-Forward 3D methods are identified, each contributing significantly to the field's progress [6][7]. - Neural Radiance Fields (NeRF) introduced a differentiable framework for volume rendering but faced efficiency issues due to scene-specific optimization. The emergence of conditional NeRF has led to various branches focusing on direct prediction of radiance fields [7][9]. - PointMap Models, led by DUSt3R, predict pixel-aligned 3D point clouds directly within a Transformer framework, enhancing efficiency and memory capabilities [9][10]. - 3D Gaussian Splatting (3DGS) represents scenes as Gaussian point clouds, balancing rendering quality and speed. Recent advancements allow for direct output of Gaussian parameters [10][12]. - Mesh, Occupancy, and SDF Models integrate traditional geometric modeling with modern techniques, enabling high-precision surface modeling [14][19]. Applications and Benchmarking - The paper summarizes the application of Feed-Forward models across various tasks, including camera pose estimation, point map estimation, and single-image view synthesis, providing a comprehensive benchmark of over 30 common 3D datasets [16][18][22]. - Evaluation metrics such as PSNR, SSIM, and Chamfer Distance are established to facilitate model comparison and performance assessment [18][23]. Future Challenges and Trends - The article identifies four major open questions for future research, including the integration of Diffusion Transformers, scalable 4D memory mechanisms, and the construction of multimodal large-scale datasets [27][28]. - Challenges such as the predominance of RGB-only data, the need for improved reconstruction accuracy, and difficulties in free-viewpoint rendering are highlighted [29].
Feed-Forward 3D综述:3D视觉进入“一步到位”时代
自动驾驶之心· 2025-10-31 16:03
Core Insights - The article discusses the evolution of 3D vision technologies, highlighting the transition from traditional methods like Structure-from-Motion (SfM) to advanced techniques such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), emphasizing the emergence of Feed-Forward 3D as a new paradigm in the AI-driven era [2][6]. Summary by Categories 1. Technological Evolution - The article outlines the historical progression in 3D vision, noting that previous methods often required per-scene optimization, which was slow and lacked generalization capabilities [2][6]. - Feed-Forward 3D is introduced as a new paradigm that aims to overcome these limitations, enabling faster and more generalized 3D understanding [2]. 2. Classification of Feed-Forward 3D Methods - The article categorizes Feed-Forward 3D methods into five main architectures, each contributing to significant advancements in the field: 1. **NeRF-based Models**: These models utilize a differentiable framework for volume rendering but face efficiency issues due to scene-specific optimization. Conditional NeRF approaches have emerged to allow direct prediction of radiance fields [8]. 2. **PointMap Models**: Led by DUSt3R, these models predict pixel-aligned 3D point clouds directly within a Transformer framework, eliminating the need for camera pose input [10]. 3. **3D Gaussian Splatting (3DGS)**: This innovative representation uses Gaussian point clouds to balance rendering quality and speed, with advancements allowing direct output of Gaussian parameters [11][13]. 4. **Mesh / Occupancy / SDF Models**: These methods combine traditional geometric modeling with modern techniques like Transformers and Diffusion models [14]. 5. **3D-Free Models**: These models learn mappings from multi-view inputs to new perspectives without relying on explicit 3D representations [15]. 3. Applications and Tasks - The article highlights diverse applications of Feed-Forward models, including: - Pose-Free Reconstruction & View Synthesis - Dynamic 4D Reconstruction & Video Diffusion - SLAM and visual localization - 3D-aware image and video generation - Digital human modeling - Robotic manipulation and world modeling [19]. 4. Benchmarking and Evaluation Metrics - The article mentions the inclusion of over 30 commonly used 3D datasets, covering various types of scenes and modalities, and summarizes standard evaluation metrics such as PSNR, SSIM, and Chamfer Distance for future model comparisons [20][21]. 5. Future Challenges and Trends - The article identifies four major open questions for future research, including the need for multi-modal data, improvements in reconstruction accuracy, challenges in free-viewpoint rendering, and the limitations of long-context reasoning in processing extensive frame sequences [25][26].
基于3DGS和Diffusion的自动驾驶闭环仿真论文总结
自动驾驶之心· 2025-07-24 09:42
Core Viewpoint - The article discusses advancements in autonomous driving simulation technology, highlighting the integration of various components such as scene rendering, data collection, and intelligent agents to create realistic driving environments [1][2][3]. Group 1: Simulation Components - The first step involves creating a static environment using 3D Gaussian Splatting and Diffusion Models to build a realistic cityscape, capturing intricate details [1]. - The second step focuses on data collection from panoramic views to extract dynamic assets like vehicles and pedestrians, enhancing the realism of simulations [2]. - The third step emphasizes relighting techniques to ensure that assets appear natural under various lighting conditions, simulating different times of day and weather scenarios [2]. Group 2: Intelligent Agents and Weather Systems - The fourth step introduces intelligent agents that mimic real-world behaviors, allowing for complex interactions within the simulation [3]. - The fifth step incorporates weather systems to enhance the atmospheric realism of the simulation, enabling scenarios like rain or fog [4]. Group 3: Advanced Features - The sixth step includes advanced features that challenge autonomous vehicles with unexpected obstacles, simulating real-world driving complexities [4].
聊聊自动驾驶闭环仿真和3DGS!
自动驾驶之心· 2025-07-22 12:46
Core Viewpoint - The article discusses the development and implementation of the Street Gaussians algorithm, which aims to efficiently model dynamic street scenes for autonomous driving simulations, addressing previous limitations in training and rendering speeds [2][3]. Group 1: Background and Challenges - Previous methods faced challenges such as slow training and rendering speeds, as well as inaccuracies in vehicle pose tracking [3]. - The Street Gaussians algorithm represents dynamic urban street scenes as a combination of point-based backgrounds and foreground objects, utilizing optimized vehicle tracking poses [3][4]. Group 2: Technical Implementation - The background model is represented as a set of points in world coordinates, each assigned a 3D Gaussian to depict geometric shape and color, with parameters including covariance matrices and position vectors [8]. - The object model for moving vehicles includes a set of optimizable tracking poses and point clouds, with similar Gaussian attributes to the background model but defined in local coordinates [11]. Group 3: Innovations in Appearance Modeling - The article introduces a 4D spherical harmonic model to encode temporal information into the appearance of moving vehicles, reducing storage costs compared to traditional methods [12]. - The effectiveness of the 4D spherical harmonic model is demonstrated, showing significant improvements in rendering results and reducing artifacts [16]. Group 4: Initialization Techniques - Street Gaussians utilizes aggregated LiDAR point clouds for initialization, addressing the limitations of traditional SfM point clouds in urban environments [17]. Group 5: Course and Learning Opportunities - The article promotes a specialized course on 3D Gaussian Splatting (3DGS), covering various subfields and practical applications in autonomous driving, aimed at enhancing understanding and implementation skills [26][30].
3D高斯泼溅算法大漏洞:数据投毒让GPU显存暴涨70GB,甚至服务器宕机
量子位· 2025-04-22 05:06
Core Viewpoint - The emergence of 3D Gaussian Splatting (3DGS) as a leading 3D modeling technology has introduced significant security vulnerabilities, particularly through a newly proposed attack method called Poison-Splat, which can drastically increase training costs and system failures [1][2][31]. Group 1: Introduction and Background - 3DGS has rapidly become a dominant technology in 3D vision, replacing NeRF due to its high rendering efficiency and realism [2][7]. - The adaptive nature of 3DGS, which adjusts computational resources based on scene complexity, is both a strength and a potential vulnerability [8][11]. - The research highlights a critical security blind spot in mainstream 3D reconstruction systems, revealing how minor alterations to input images can lead to significant operational disruptions [2][31]. Group 2: Attack Mechanism - The Poison-Splat attack targets the GPU memory usage and training time by introducing perturbations to input images, leading to increased computational costs [12][22]. - The attack is modeled as a max-min bi-level optimization problem, employing innovative strategies such as a proxy model to approximate the victim's behavior and maximizing the Total Variation (TV) of images to induce excessive complexity in 3DGS [13][16][15]. - The attack can significantly increase GPU memory usage from under 4GB to 80GB and training time by up to five times, demonstrating its effectiveness [25][22]. Group 3: Experimental Results - Experiments conducted on various 3D datasets showed that unconstrained attacks could lead to GPU memory usage surging by 20 times and rendering speeds dropping to one-tenth of the original [25][22]. - Even with constraints on pixel perturbations, the attack remains potent, with some scenarios showing over eightfold increases in memory consumption [27][22]. Group 4: Implications and Contributions - The research emphasizes that the findings are not merely academic but represent real threats to 3D service providers that allow user-uploaded content [31][40]. - Simple defenses, such as limiting the number of Gaussian points, are ineffective as they compromise the quality of 3D reconstructions [39][35]. - The study aims to raise awareness about the security of AI systems in 3D modeling, advocating for the development of more intelligent defense mechanisms [41][37].