推理时间减少70%！前馈3DGS「压缩神器」来了，浙大Monash联合出品

Core Viewpoint - Novel View Synthesis (NVS) is becoming increasingly important in cutting-edge applications like Augmented Reality (AR) and Virtual Reality (VR), with 3D Gaussian Splatting (3DGS) emerging as a key technology due to its real-time rendering capabilities and visual quality [1] Group 1 - The existing Feed-Forward 3DGS model has made significant progress in real-time rendering and efficient 3D scene generation but still faces critical limitations, particularly in encoder capacity, which struggles with dense multi-view inputs [1][2] - ZPressor is a lightweight plug-and-play module that can seamlessly integrate into existing Feed-Forward 3DGS models, enhancing their capacity for dense view expansion and performance [3] - ZPressor improves performance by 4.65 dB with 36 input views, reduces inference time by 70%, and decreases memory usage by 80%, while expanding the number of input views to nearly 500 [4] Group 2 - The core issue with the existing Feed-Forward 3DGS model is its limited encoder capacity, which leads to "information overload" when handling dense inputs, causing a spike in computational costs [5][6] - The phenomenon of information overload arises from the redundancy in the total information of the scene, which complicates effective processing [7] - Efficiently removing irrelevant information while retaining predictive capability is crucial for utilizing input view information effectively [8] Group 3 - The DepthSplat model demonstrates that as the number of input views increases, model performance significantly declines, and computational costs rise sharply, indicating a direct causal relationship between information overload and resource consumption [10][11] - The Information Bottleneck (IB) principle has been introduced to theoretically understand and address the issue of information overload in Feed-Forward 3DGS models [12] - ZPressor employs the IB principle to compress multi-view inputs into a compact latent state, effectively retaining necessary information while eliminating redundancy [14] Group 4 - ZPressor's information compression process is divided into three steps: Anchor View Selection, Support-to-Anchor Assignment, and Views Information Fusion, creating an efficient "information compressor" [17] - The first step involves selecting diverse anchor views using farthest point sampling to represent the entire scene [18] - The second step assigns remaining support views to the nearest anchor views based on camera distance, ensuring complementary scene details are grouped effectively [19] - The third step utilizes a customized cross-attention module for information fusion, capturing the relationship between anchor and support views while avoiding redundancy [20][21] Group 5 - ZPressor has a transformative impact on Feed-Forward 3DGS models, as evidenced by extensive experiments on benchmark datasets like DL3DV-10K, RealEstate10K, and ACID [23] - In experiments, ZPressor demonstrated significant performance improvements under dense input conditions, effectively mitigating the performance decline caused by information overload [24] - ZPressor also addresses memory issues faced by existing models, enabling pixelSplat to run successfully with at least 36 views, where it previously encountered out-of-memory errors [24] Group 6 - Qualitative comparisons in dense input conditions show that ZPressor effectively compresses information, significantly improving visual quality compared to DepthSplat [27] - The effectiveness of the IB principle in ZPressor highlights the importance of balancing compression and information retention, particularly in varying scene information scales [30] - ZPressor maintains stability in 3D Gaussian counts, inference latency, and peak memory usage as input view numbers increase, contrasting sharply with baseline models that experience linear growth in these metrics [32]