HarmoniCa

Search documents
Diffusion约2倍无损加速!训练-推理协同的缓存学习框架来了| HKUST&北航&商汤
量子位· 2025-07-06 05:12
Core Viewpoint - The article presents a new caching acceleration solution called HarmoniCa, which addresses the slow inference speed and high costs associated with diffusion models, particularly the Diffusion Transformer (DiT) architecture, achieving high-performance lossless acceleration [1][7][30]. Group 1: HarmoniCa Framework - HarmoniCa is designed to overcome the speed bottlenecks of the DiT architecture during deployment, enabling efficient training and inference collaboration through a feature caching acceleration framework [1][30]. - The framework introduces two key mechanisms: Step-Wise Denoising Training (SDT) and Image Error Proxy Objective (IEPO), which align training and inference processes to enhance performance [10][15][16]. Group 2: Mechanisms of HarmoniCa - SDT simulates the entire denoising process during training, reducing error accumulation and improving final image clarity and stability [11][12][15]. - IEPO focuses on optimizing the final image quality rather than intermediate noise errors, ensuring that the training objectives align with the ultimate goal of high-quality image generation [15][16]. Group 3: Experimental Results - HarmoniCa was tested against various methods, including Learning-to-Cache (LTC) and heuristic caching methods, demonstrating superior performance in terms of image quality and acceleration [17][19][20]. - In high compression scenarios (10-step inference), HarmoniCa maintained image quality advantages, achieving lower FID scores compared to LTC while improving cache utilization [19][22]. Group 4: Comparison with Other Techniques - HarmoniCa outperformed traditional pruning and quantization methods, providing stable acceleration without relying on specialized hardware, making it a more versatile deployment option [21][24]. - The framework showed compatibility with quantization techniques, enhancing inference speed while maintaining image quality, indicating its potential as an "acceleration plugin" [24][25]. Group 5: Cost Analysis - HarmoniCa demonstrated significant advantages in both training and inference costs, with a training time reduction of approximately 25% compared to mainstream methods, and minimal impact on throughput during inference [27][28]. - The lightweight nature of the added Router in the inference process ensures that it occupies only 0.03% of parameters, contributing to its efficiency [28]. Group 6: Conclusion - The article concludes that the HarmoniCa framework represents a new paradigm in caching acceleration, emphasizing the importance of synchronized training and inference processes to achieve optimal performance, efficiency, and adaptability in real-world deployments [29][30].