Workflow
Diffusion Models
icon
Search documents
ICCV 2025 | EPD-Solver:西湖大学发布并行加速扩散采样算法
机器之心· 2025-08-02 04:43
Core Viewpoint - The article discusses the advancements in diffusion models, particularly the introduction of the Ensemble Parallel Direction Solver (EPD-Solver), which enhances the efficiency and quality of image generation while addressing the latency issues associated with traditional methods [2][3][27]. Group 1: Diffusion Models Overview - Diffusion models have rapidly become mainstream technologies for generating images, videos, audio, and 3D content due to their high-quality output [2]. - The core mechanism of diffusion models involves a "denoising" process that iteratively refines a random image into a clear one, which, while ensuring quality, leads to significant inference delays [2]. Group 2: Acceleration Strategies - Researchers proposed three main acceleration strategies: using ODE solvers to reduce iteration steps, model distillation to compress multi-step processes, and parallel computing to speed up inference [3]. - Each method has limitations, such as quality loss with fewer iterations, high costs of retraining models, and underutilization of parallelism in low-step scenarios [3]. Group 3: EPD-Solver Innovation - The EPD-Solver combines the advantages of the aforementioned strategies, utilizing a numerical solver framework, lightweight distillation for a small set of learnable parameters, and parallel computation of gradients [3][4]. - This method effectively reduces numerical integration errors without significant modifications to the model or additional latency, achieving high-quality image generation with only 3-5 sampling steps [3][4]. Group 4: Performance and Results - EPD-Solver can be integrated as a "plugin" into existing solvers, significantly enhancing their generation quality and efficiency [4]. - Experimental results show that EPD-Solver outperforms baseline solvers in various benchmarks like CIFAR-10, FFHQ, and ImageNet, demonstrating its potential in low-latency, high-quality generation tasks [21][25]. Group 5: Key Advantages - The method offers parallel efficiency and precision improvements by introducing multiple gradient evaluations, which significantly enhance ODE integration accuracy while maintaining zero additional inference delay [28]. - EPD-Solver is lightweight and can be easily integrated into existing ODE samplers, avoiding the costly retraining of diffusion models [28].
EasyCache:无需训练的视频扩散模型推理加速——极简高效的视频生成提速方案
机器之心· 2025-07-12 04:50
Core Viewpoint - The article discusses the development of EasyCache, a new framework for accelerating video diffusion models without requiring training or structural changes to the model, significantly improving inference efficiency while maintaining video quality [7][27]. Group 1: Research Background and Motivation - The application of diffusion models and diffusion Transformers in video generation has led to significant improvements in the quality and coherence of AI-generated videos, transforming digital content creation and multimedia entertainment [3]. - However, issues such as slow inference and high computational costs have emerged, with examples like HunyuanVideo taking 2 hours to generate a 5-second video at 720P resolution, limiting the technology's application in real-time and large-scale scenarios [4][5]. Group 2: Methodology and Innovations - EasyCache operates by dynamically detecting the "stable period" of model outputs during inference, allowing for the reuse of historical computation results to reduce redundant inference steps [7][16]. - The framework measures the "transformation rate" during the diffusion process, which indicates the sensitivity of current outputs to inputs, revealing that outputs can be approximated using previous results in later stages of the process [8][12][15]. - EasyCache is designed to be plug-and-play, functioning entirely during the inference phase without the need for model retraining or structural modifications [16]. Group 3: Experimental Results and Visual Analysis - Systematic experiments on mainstream video generation models like OpenSora, Wan2.1, and HunyuanVideo demonstrated that EasyCache achieves a speedup of 2.2 times on HunyuanVideo, with a 36% increase in PSNR and a 14% increase in SSIM, while maintaining video quality [20][26]. - In image generation tasks, EasyCache also provided a 4.6 times speedup, improving FID scores, indicating its effectiveness across different applications [21][22]. - Visual comparisons showed that EasyCache retains high visual fidelity, with generated videos closely matching the original model outputs, unlike other methods that exhibited varying degrees of quality loss [24][25]. Group 4: Conclusion and Future Outlook - EasyCache presents a minimalistic and efficient paradigm for accelerating inference in video diffusion models, laying a solid foundation for practical applications of diffusion models [27]. - The expectation is to further approach the goal of "real-time video generation" as models and acceleration technologies continue to evolve [27].
从科研到落地,从端到端到VLA!一个近4000人的智驾社区,大家在这里报团取暖~
自动驾驶之心· 2025-07-11 11:23
Core Viewpoint - The article emphasizes the establishment of a comprehensive community for autonomous driving, aiming to gather industry professionals and facilitate rapid responses to challenges, with a target of building a community of 10,000 members within three years [2]. Group 1: Community Development - The community aims to integrate academic research, product development, and recruitment, creating a closed-loop system for education and technical discussions [2][5]. - It has already attracted notable figures from the industry, including talents from Huawei and leading researchers in autonomous driving [2]. - The community will provide resources such as video courses, hardware, and practical coding experiences related to autonomous driving [2][3]. Group 2: Learning Resources - A structured learning roadmap is available, covering essential topics for newcomers, including how to ask questions and access weekly Q&A sessions [3][4]. - The community offers a variety of courses on foundational topics like deep learning, computer vision, and advanced algorithms in autonomous driving [4][21]. - Members can access exclusive content, including over 5,000 resources and discounts on paid courses [19][21]. Group 3: Industry Engagement - The community collaborates with numerous companies in the autonomous driving sector, providing direct recruitment channels and job postings [5][6]. - It aims to connect students and professionals with industry leaders, enhancing networking opportunities and knowledge sharing [5][6]. - The community is positioned as a hub for both academic and industrial advancements in autonomous driving technology [12][14]. Group 4: Technological Focus - The article highlights the rapid evolution of technology in autonomous driving, with a focus on end-to-end systems and the integration of large models [7][24]. - Key areas of interest include visual language models, world models, and closed-loop simulations, which are critical for the future of autonomous driving [7][24]. - The community plans to host live sessions with experts from top conferences to discuss practical applications and research advancements [23][24].
Z Tech|对话CV泰斗何恺明新作研究团队,三位05后MIT本科生,Diffusion真的需要噪声条件吗?
Z Potentials· 2025-02-27 04:09
Core Viewpoint - The recent research led by renowned scholar He Kaiming and three MIT freshmen challenges the traditional understanding of noise conditioning in denoising models, suggesting that it may not be essential for model performance [1][3]. Group 1: Research Findings - The study demonstrates that removing noise conditioning from many mainstream denoising models results in only a modest degradation in performance [4]. - The newly designed unconditional model, uEDM, achieves a near-state-of-the-art FID score of 2.23 in the CIFAR-10 benchmark, only slightly behind the top noise-conditioned model, EDM, which has an FID score of 1.97 [2][6]. - The research provides a theoretical framework and experimental results that validate the stability of mainstream denoising models when noise conditioning is removed, indicating the non-necessity of traditional noise conditioning techniques in practical applications [3][5]. Group 2: Implications and Future Directions - The findings open avenues for reducing model computational complexity and inspire new model designs that do not rely on noise conditioning [3]. - The upcoming live lecture will feature discussions on generative models and potential development directions, including a Q&A session with the authors [2].