LeMiCa
Search documents
联通破解扩散模型速度质量零和博弈,推理速度提升5倍丨CVPR 2025 Highlight
量子位· 2025-12-01 04:26
Core Insights - The article discusses the advancements in diffusion models, particularly focusing on the ShortDF and LeMiCa papers, which represent significant breakthroughs in the field of image and video generation [1][2][4]. Group 1: Technical Evolution - ShortDF serves as a theoretical pioneer in optimizing diffusion models through online training, while LeMiCa expands this theory into offline mapping for higher-dimensional tasks [4]. - The core challenge in diffusion models is the expensive inference costs, which hinder real-time applications [8]. - The non-linear denoising trajectory of diffusion models is identified as a primary reason for slow progress in the field [9]. Group 2: ShortDF's Mechanisms - ShortDF introduces a "shortest path optimization" approach to directly straighten the denoising trajectory during training, aiming to break the trade-off between speed and quality [12]. - The model's core insight is that the denoising process is fundamentally a correction of the initial error, which can be minimized to improve overall performance [13][14]. - ShortDF employs a three-pronged strategy: 1. Locking the "error upper bound" to optimize from the source [14][15]. 2. Utilizing graph theory to relax and compress paths, thereby minimizing the error upper bound [20][21]. 3. Implementing multi-state optimization to ensure training stability amidst random noise [28][29]. Group 3: Performance Metrics - ShortDF demonstrates superior performance in speed and quality, achieving a 5.0 times speed increase over DDIM while improving image quality (FID score of 9.08 compared to DDIM's 11.14) [36]. - The model shows robustness in complex scenarios, effectively restoring object contours faster than competing methods [37]. - In various datasets, ShortDF maintains a balance between performance and speed, showcasing its potential for real-world applications [40]. Group 4: Industry Implications - The advancements in ShortDF and LeMiCa highlight the importance of refined mathematical modeling over mere computational power in enhancing diffusion model speeds [41]. - These developments are crucial for the application of AIGC technology in resource-constrained environments, such as mobile devices and real-time interactive designs [42].
NeurIPS 2025 Spotlight | 中国联通以全局优化重塑扩散模型加速
机器之心· 2025-11-26 01:36
Core Insights - The article discusses the rapid advancements in video generation models, particularly the performance of the Transformer-based DiT model, which is approaching real-life shooting effects. However, it highlights a significant bottleneck: long inference times, high computational costs, and challenges in increasing generation speed [2][29]. - A new approach called LeMiCa (Lexicographic Minimax Path Caching) is introduced, which is a cache acceleration framework that does not require training and achieves global optimal modeling while maintaining image quality and consistency [2][29]. LeMiCa Framework - LeMiCa addresses the long-standing issue of whether a truly "globally consistent, error-controllable, and fast" caching acceleration path exists for diffusion models, concluding that such a path does exist and is simpler than previously thought [2][7]. - The core idea of LeMiCa is that caching acceleration is not a local decision problem but a global path optimization problem [7]. Technical Implementation - The generation process of diffusion models can be abstracted as a weighted directed acyclic graph (DAG), where each node represents a time step and edges represent the behavior of skipping computations and reusing caches [8]. - LeMiCa introduces a novel error measurement method to quantify the impact of caching on the final video results by constructing a static DAG offline [11][12]. Optimization Strategy - The optimization problem is formalized as finding the optimal path from the start to the end within a fixed budget, using a lexicographic minimax path approach to ensure that the maximum error is minimized and the error distribution is more balanced [12][13]. - Experimental results show that LeMiCa achieves significant improvements in both speed and visual quality compared to other mainstream methods [14][19]. Performance Metrics - LeMiCa demonstrates a speedup of over 2.4× in inference performance while significantly enhancing visual consistency and quality across various video generation models [19][20]. - The framework has been validated across multiple mainstream video generation models, showing superior performance in maintaining visual consistency before and after acceleration [14][19]. Robustness and Compatibility - LeMiCa exhibits robustness in acceleration paths, maintaining effectiveness even when sampling schedules are altered [20]. - The framework is compatible with text-to-image models, as demonstrated with the QWen-Image model, achieving similar acceleration effects [21]. Industry Recognition - LeMiCa has received endorsements from top-tier multi-modal model development teams, including Alibaba's Tongyi Qianwen and Zhizhu AI, highlighting its significance in the industry [24][25]. Conclusion - LeMiCa redefines the acceleration problem in diffusion video generation from a global optimization perspective, breaking through the limitations of traditional local greedy caching strategies and providing a new paradigm for video generation that is both fast and stable [29].