扩散模型蒸馏范式

Search documents
 清华联手英伟达打造扩散模型新蒸馏范式!视频生成提速50倍,4步出片不穿模
 量子位· 2025-10-22 09:12
 Core Insights - The article discusses a new distillation paradigm called rCM that significantly enhances video generation speed by up to 50 times while maintaining high quality and diversity in the generated content [4][20][33]   Group 1: Introduction of rCM - rCM is a novel large-scale diffusion model distillation paradigm developed by Tsinghua University and NVIDIA, which successfully extends continuous time consistency distillation to billion-parameter models [5][9] - The method addresses bottlenecks in existing approaches, particularly in real-world applications involving large-scale text-to-image and text-to-video models [3][9]   Group 2: Technical Innovations - The rCM framework introduces a forward-reverse divergence joint optimization approach, which enhances inference speed while ensuring high-quality and diverse generation results [4][11] - By utilizing self-developed FlashAttention-2 JVP CUDA operators and compatible distributed training strategies, rCM successfully applies continuous time consistency distillation to leading models like Cosmos and Wan2.1 [13][18]   Group 3: Performance Metrics - rCM demonstrates exceptional performance across various large-scale text-to-image and text-to-video tasks, compressing the sampling process from hundreds of steps to an impressive 1-4 steps, achieving a speedup of 15-50 times [20][21] - In evaluations, the rCM model matches or even surpasses the performance of teacher models that require hundreds of sampling steps [21][25]   Group 4: Quality and Diversity - The rCM model effectively addresses the quality shortcomings of previous models by incorporating reverse divergence as a regularization term, allowing it to maintain high diversity while improving quality [19][22] - Compared to previous state-of-the-art distillation methods, rCM exhibits significantly higher diversity in generated video content, effectively avoiding "mode collapse" issues [25][31]   Group 5: Future Applications - rCM is expected to be widely applied in NVIDIA's Cosmos series of world models, indicating its potential for broader industry adoption [34]

