首次将十亿参数三维模型塞进手机！4比特量化，速度2.5倍、内存降3.7倍、精度98%｜ICLR'26

Core Insights - The article discusses the development of QuantVGGT, a quantization framework designed to effectively compress and accelerate the Visual Geometry Grounded Transformers (VGGT) model, which has over 1 billion parameters, while maintaining high accuracy and performance [2][5][58]. Group 1: Quantization Framework - QuantVGGT utilizes 4-bit quantization, achieving a speed increase of 2.5 times and a memory reduction of 3.7 times, while preserving 98% of the reconstruction accuracy compared to the full precision model [2][5][7]. - The framework introduces two main technical contributions: Dual-Smoothed Fine-Grained Quantization (DSFQ) and Noise-Filtered Diverse Sampling (NFDS) [5][9]. Group 2: Challenges in Quantization - VGGT's unique properties, such as the presence of data-independent special tokens and the inherent complexity of 3D data, pose significant challenges for quantization [11][12]. - The data-independent tokens lead to a heavy-tailed activation distribution, complicating the quantization process and increasing the risk of information loss [11][12]. Group 3: Technical Contributions - DSFQ combines pre-global Hadamard rotation and post-local channel smoothing to mitigate the heavy-tailed distribution and inter-channel variance issues [5][9][30]. - NFDS employs deep statistical information to filter out noise and create frame-aware diverse calibration clusters, ensuring the stability of the quantization range [5][9][40]. Group 4: Experimental Results - Extensive experiments demonstrate that QuantVGGT outperforms existing quantization methods across various benchmark datasets and bit widths, achieving optimal performance [5][13][59]. - In camera pose estimation tasks, QuantVGGT maintains 99.9% performance at 8-bit quantization and achieves an AUC@30 of 88.2 at 4-bit quantization, significantly outperforming other methods [47][50]. Group 5: Efficiency and Deployment - The proposed quantization framework shows minimal additional latency, with only a 0.2% increase in delay while significantly retaining model performance [56][58]. - The results indicate that QuantVGGT is well-suited for deployment in resource-constrained environments, demonstrating its practical advantages [5][58].