单卡训练1亿高斯点,重建25平方公里城市:3DGS内存墙被CPU「外挂」打破了
3 6 Ke·2025-12-23 07:27

Core Insights - The article discusses a new system called CLM (CPU-offloaded Large-scale 3DGS training) developed by a research team from New York University, which allows for city-scale 3D reconstruction using a single consumer-grade GPU, specifically the RTX 4090, by offloading memory-intensive parameters to CPU memory [1][20]. Group 1: 3D Gaussian Splatting (3DGS) Challenges - 3DGS has become a significant technology in neural rendering due to its high-quality rendering and speed, but it faces scalability issues when applied to complex scenes like urban areas, primarily due to GPU memory limitations [2]. - A high-precision 3DGS model typically contains tens of millions to over a billion Gaussian points, with each point requiring substantial memory for parameters and gradients, making it difficult to train on a single GPU [2][3]. Group 2: CLM System Design - CLM is designed to address the GPU memory bottleneck by dynamically loading Gaussian parameters from CPU memory only when needed, rather than keeping all parameters in GPU memory [3][4]. - The system employs three key mechanisms: 1. Attribute Segmentation: Only "key attributes" necessary for visibility are stored in GPU memory, while the majority of parameters are offloaded to CPU memory [5][6]. 2. Pre-rendering Visibility Culling: CLM calculates visible Gaussian points before rendering, reducing unnecessary computations and memory usage on the GPU [7][8]. 3. Efficient CPU Utilization: CLM minimizes data transfer delays through micro-batching, caching, and intelligent scheduling, allowing the CPU to effectively assist in training without slowing down the process [10][12]. Group 3: Performance Results - The implementation of CLM on an RTX 4090 allowed for the training of 102.2 million Gaussian points, a 6.7-fold increase compared to the traditional method, which could only handle 15.3 million points [13][14]. - Despite communication overhead, CLM achieved a training throughput of 55% to 90% of the enhanced baseline on the RTX 4090, and up to 86% to 97% on the slower RTX 2080 Ti [16]. - The quality of reconstruction improved significantly, with the PSNR of the 102.2 million point model reaching 25.15 dB, compared to 23.93 dB for the 15.3 million point model [18]. Group 4: Broader Implications - CLM represents a cost-effective solution for large-scale 3D reconstruction, addressing deployment challenges without the need for multi-GPU setups, which is beneficial for both academic and industrial applications [20]. - The growing demand for efficient and low-cost 3D reconstruction tools in areas like digital twins and large-scale mapping makes CLM's approach particularly relevant [20].

单卡训练1亿高斯点,重建25平方公里城市:3DGS内存墙被CPU「外挂」打破了 - Reportify