扩散模型加速
Search documents
NeurIPS 2025 Spotlight | 中国联通以全局优化重塑扩散模型加速
机器之心· 2025-11-26 01:36
Core Insights - The article discusses the rapid advancements in video generation models, particularly the performance of the Transformer-based DiT model, which is approaching real-life shooting effects. However, it highlights a significant bottleneck: long inference times, high computational costs, and challenges in increasing generation speed [2][29]. - A new approach called LeMiCa (Lexicographic Minimax Path Caching) is introduced, which is a cache acceleration framework that does not require training and achieves global optimal modeling while maintaining image quality and consistency [2][29]. LeMiCa Framework - LeMiCa addresses the long-standing issue of whether a truly "globally consistent, error-controllable, and fast" caching acceleration path exists for diffusion models, concluding that such a path does exist and is simpler than previously thought [2][7]. - The core idea of LeMiCa is that caching acceleration is not a local decision problem but a global path optimization problem [7]. Technical Implementation - The generation process of diffusion models can be abstracted as a weighted directed acyclic graph (DAG), where each node represents a time step and edges represent the behavior of skipping computations and reusing caches [8]. - LeMiCa introduces a novel error measurement method to quantify the impact of caching on the final video results by constructing a static DAG offline [11][12]. Optimization Strategy - The optimization problem is formalized as finding the optimal path from the start to the end within a fixed budget, using a lexicographic minimax path approach to ensure that the maximum error is minimized and the error distribution is more balanced [12][13]. - Experimental results show that LeMiCa achieves significant improvements in both speed and visual quality compared to other mainstream methods [14][19]. Performance Metrics - LeMiCa demonstrates a speedup of over 2.4× in inference performance while significantly enhancing visual consistency and quality across various video generation models [19][20]. - The framework has been validated across multiple mainstream video generation models, showing superior performance in maintaining visual consistency before and after acceleration [14][19]. Robustness and Compatibility - LeMiCa exhibits robustness in acceleration paths, maintaining effectiveness even when sampling schedules are altered [20]. - The framework is compatible with text-to-image models, as demonstrated with the QWen-Image model, achieving similar acceleration effects [21]. Industry Recognition - LeMiCa has received endorsements from top-tier multi-modal model development teams, including Alibaba's Tongyi Qianwen and Zhizhu AI, highlighting its significance in the industry [24][25]. Conclusion - LeMiCa redefines the acceleration problem in diffusion video generation from a global optimization perspective, breaking through the limitations of traditional local greedy caching strategies and providing a new paradigm for video generation that is both fast and stable [29].
ACM MM 2025 | 小红书AIGC团队提出风格迁移加速算法STD
机器之心· 2025-08-04 07:05
Core Viewpoint - The article presents a novel approach called Single Trajectory Distillation (STD) aimed at enhancing the efficiency and quality of image and video style transfer tasks within the AIGC domain, addressing issues related to style consistency and aesthetic quality in existing models [2][3][49]. Group 1: Introduction to STD - The authors from the Dynamic-X-Lab focus on advancing technologies in image generation and video animation, utilizing high-quality generative models [2]. - The existing consistency models face challenges in maintaining style similarity and aesthetic quality, particularly in image-to-image and video-to-video transformations [2][3]. Group 2: Mechanism of STD - STD introduces a training framework that starts from a partially noisy state, addressing the inefficiencies of traditional methods [3]. - A Trajectory Bank is designed to store intermediate states from the teacher model's PF-ODE trajectory, reducing the computational burden during student model training [3][11]. - An Asymmetric Adversarial Loss is incorporated to significantly enhance the style consistency and perceptual quality of the generated results [4][11]. Group 3: Experimental Results - Extensive experiments demonstrate that STD outperforms existing accelerated diffusion models in terms of style similarity and aesthetic evaluation [5][33]. - In comparative tests, STD achieved a CSD score of 0.503 and an aesthetic score of 4.815, surpassing other methods [30][33]. Group 4: Ablation Studies - Ablation studies confirm that the use of the Trajectory Bank mitigates the additional training time introduced by STD, while both STD and the Asymmetric Adversarial Loss significantly improve style similarity and aesthetic scores [36][37]. - The results indicate that the strength of the Asymmetric Adversarial Loss correlates positively with image quality, reducing noise and enhancing contrast [38]. Group 5: Scalability and Future Applications - The STD method is posited to be applicable to other tasks involving partial noise-based image and video editing, such as inpainting, showcasing superior results compared to traditional methods [47][49].