单卡训练1亿高斯点，重建25平方公里城市：3DGS内存墙被CPU「外挂」打破了

Core Viewpoint - The article discusses the development of a new system called CLM (CPU-offloaded Large-scale 3DGS training) that allows for city-scale 3D reconstruction using a single consumer-grade GPU, significantly lowering the hardware requirements for large-scale neural rendering [3][22]. Group 1: 3D Gaussian Splatting (3DGS) Challenges - 3DGS has become a crucial technology in neural rendering due to its high-quality output and rendering speed, but it faces significant challenges when applied to complex scenes like urban environments, primarily due to GPU memory limitations [5][6]. - A high-precision 3DGS model typically contains tens of millions to over a billion Gaussian points, with each point requiring substantial memory for parameters and gradients, making it difficult to train on a single GPU like the RTX 4090, which can only handle about 15-30 million points [6][7]. Group 2: CLM System Design - CLM addresses the memory bottleneck by dynamically loading Gaussian parameters from CPU memory as needed, rather than keeping all parameters in GPU memory [8][9]. - The system employs three key mechanisms: 1. Attribute Segmentation: Only "key attributes" necessary for visibility are stored in GPU memory, while the majority of non-key attributes are offloaded to CPU memory [10][11]. 2. Pre-rendering Frustum Culling: CLM calculates visible Gaussian points before rendering, reducing unnecessary computations and memory usage [12][13]. 3. Efficient CPU Utilization: CLM minimizes data transfer delays through micro-batching, caching, and intelligent scheduling, allowing the CPU to effectively assist in training without slowing down the process [14][15][16][17]. Group 3: Performance and Scalability - Experimental results show that CLM can significantly increase model size and quality; for instance, it trained 102.2 million Gaussian points on a 25.3 square kilometer dataset, a 6.7 times increase compared to traditional methods [18][22]. - The system is versatile and can be applied to various splatting algorithms beyond 3DGS, making it a valuable tool for both academic and industrial applications in large-scale scene reconstruction [21][22]. - Despite communication overhead, CLM maintains a training throughput of 55% to 90% of the enhanced baseline on RTX 4090, demonstrating its efficiency [23].