Workflow
3D场景表征
icon
Search documents
AAAI 2026 Oral | SplatSSC:解耦深度引导的高斯泼溅,开启单目语义场景补全高效新范式
机器之心· 2026-01-28 04:59
Core Viewpoint - The article discusses the development of SplatSSC, a novel framework for Semantic Scene Completion (SSC) that addresses the limitations of traditional dense grid representations by utilizing a depth-guided approach and decoupled aggregation mechanism to enhance performance and efficiency [3][4]. Group 1: Challenges in Traditional Methods - Traditional dense grid representations in SSC have been limited by two main issues: low utilization rates of randomly initialized Gaussian primitives (approximately 3.9%) and the generation of erroneous semantic fragments known as "Floaters" due to isolated outliers [3][4]. - The existing methods often rely on large-scale random distributions of Gaussian primitives, leading to significant computational redundancy and wasted model capacity [6]. Group 2: SplatSSC Framework - SplatSSC introduces an innovative depth-guided strategy and a decoupled aggregation mechanism, resulting in a significant leap in performance and efficiency [4]. - The framework employs a parallel branch strategy, integrating a learnable image encoder for multi-scale semantic extraction and a pre-trained Depth-Anything model for stable depth features [10]. Group 3: Core Technologies - The Group-wise Multi-scale Fusion (GMF) module in SplatSSC replaces random initialization with precise guidance using geometric priors, requiring only 1,200 Gaussian primitives (about 7% of previous methods) to effectively cover spatial distributions [11][13]. - The Decoupled Gaussian Aggregator (DGA) is designed to combat the "Floaters" issue by decoupling occupancy probability from semantic contributions, ensuring clean scene boundaries [15][19]. Group 4: Experimental Validation - SplatSSC achieved state-of-the-art (SOTA) performance on the Occ-ScanNet dataset, with an Intersection over Union (IoU) score of 62.83% and a mean IoU (mIoU) of 51.83%, surpassing previous SOTA methods by 6.35% and 4.16% respectively [22][23]. - The model demonstrated superior fine-grained perception capabilities, particularly in recognizing intricate objects like chair legs and table surfaces [22]. Group 5: Efficiency and Resource Management - SplatSSC's design allows for a significant reduction in inference latency (approximately 9.3% to 115.63 ms) and memory consumption (approximately 9.6%), while maintaining a stable parameter scale with only a 0.19% increase [34]. - The framework's efficiency is highlighted by its ability to achieve high-quality scene reconstruction with fewer Gaussian primitives, demonstrating that the "quality" of primitives is more critical than their "quantity" [32][33].