3D高斯泼溅(3DGS)
Search documents
单卡训练1亿高斯点,重建25平方公里城市:3DGS内存墙被CPU「外挂」打破了
3 6 Ke· 2025-12-23 07:27
Core Insights - The article discusses a new system called CLM (CPU-offloaded Large-scale 3DGS training) developed by a research team from New York University, which allows for city-scale 3D reconstruction using a single consumer-grade GPU, specifically the RTX 4090, by offloading memory-intensive parameters to CPU memory [1][20]. Group 1: 3D Gaussian Splatting (3DGS) Challenges - 3DGS has become a significant technology in neural rendering due to its high-quality rendering and speed, but it faces scalability issues when applied to complex scenes like urban areas, primarily due to GPU memory limitations [2]. - A high-precision 3DGS model typically contains tens of millions to over a billion Gaussian points, with each point requiring substantial memory for parameters and gradients, making it difficult to train on a single GPU [2][3]. Group 2: CLM System Design - CLM is designed to address the GPU memory bottleneck by dynamically loading Gaussian parameters from CPU memory only when needed, rather than keeping all parameters in GPU memory [3][4]. - The system employs three key mechanisms: 1. **Attribute Segmentation**: Only "key attributes" necessary for visibility are stored in GPU memory, while the majority of parameters are offloaded to CPU memory [5][6]. 2. **Pre-rendering Visibility Culling**: CLM calculates visible Gaussian points before rendering, reducing unnecessary computations and memory usage on the GPU [7][8]. 3. **Efficient CPU Utilization**: CLM minimizes data transfer delays through micro-batching, caching, and intelligent scheduling, allowing the CPU to effectively assist in training without slowing down the process [10][12]. Group 3: Performance Results - The implementation of CLM on an RTX 4090 allowed for the training of 102.2 million Gaussian points, a 6.7-fold increase compared to the traditional method, which could only handle 15.3 million points [13][14]. - Despite communication overhead, CLM achieved a training throughput of 55% to 90% of the enhanced baseline on the RTX 4090, and up to 86% to 97% on the slower RTX 2080 Ti [16]. - The quality of reconstruction improved significantly, with the PSNR of the 102.2 million point model reaching 25.15 dB, compared to 23.93 dB for the 15.3 million point model [18]. Group 4: Broader Implications - CLM represents a cost-effective solution for large-scale 3D reconstruction, addressing deployment challenges without the need for multi-GPU setups, which is beneficial for both academic and industrial applications [20]. - The growing demand for efficient and low-cost 3D reconstruction tools in areas like digital twins and large-scale mapping makes CLM's approach particularly relevant [20].
IJRR最新成果!中山大学提出基于3D高斯泼溅的机器人自建模技术:仅凭RGB图像实现高保真形态、运动与颜色重建
机器人大讲堂· 2025-11-17 09:00
Core Viewpoint - The article discusses the development of a new self-modeling technology for robots based on 3D Gaussian Splatting (3DGS), which allows for high-quality modeling of robot morphology, kinematics, and surface color using only standard RGB cameras, significantly reducing data collection costs and enhancing the capabilities of autonomous robots [2][23]. Group 1: Technology Overview - 3D Gaussian Splatting (3DGS) is a recent advancement in 3D scene reconstruction, providing efficient and high-quality 3D representation, addressing the challenges of traditional self-modeling methods [3]. - Each 3D Gaussian function represents a small ellipsoid defined by parameters such as position, covariance matrix, color, and opacity, allowing for precise reconstruction of a robot's 3D shape and surface color [4]. - Compared to traditional mesh modeling or NeRF, 3DGS offers faster rendering speeds (0.08 seconds per image) and strong representation capabilities, capturing both geometric details and color information [6]. Group 2: Methodology - The self-modeling process involves data collection, static reconstruction, dynamic training, and model optimization, each designed to address specific technical challenges [11]. - Data collection requires only a standard RGB camera and joint angle sensors, capturing thousands of images to ensure comprehensive training while controlling costs [12]. - A phased training strategy is employed to avoid convergence issues, starting with static model training, followed by the training of the kinematic network and neural skeleton, and finally joint training of all parameters [13]. Group 3: Experimental Validation - The research team validated the method through rigorous experiments, achieving a peak signal-to-noise ratio (PSNR) of 31.22 and a structural similarity index (SSIM) of 0.988 in simulation environments, outperforming traditional methods [15]. - In physical experiments, despite challenges like camera calibration errors, the method successfully reconstructed a reliable model of the robot, demonstrating its feasibility in real-world applications [17]. Group 4: Applications and Future Directions - The developed self-model can be directly applied to downstream tasks such as motion planning and obstacle avoidance, showcasing its ability to autonomously adjust joint angles for precise positioning and safe navigation [20]. - The technology also offers new solutions for inverse kinematics problems, allowing for accurate estimation of a robot's current state based on visual input [22]. - Future exploration is needed to extend the technology to soft and continuum robots, as current methods are based on rigid link assumptions [24].