Parallel Computing

Search documents
ICCV 2025 | EPD-Solver:西湖大学发布并行加速扩散采样算法
机器之心· 2025-08-02 04:43
Core Viewpoint - The article discusses the advancements in diffusion models, particularly the introduction of the Ensemble Parallel Direction Solver (EPD-Solver), which enhances the efficiency and quality of image generation while addressing the latency issues associated with traditional methods [2][3][27]. Group 1: Diffusion Models Overview - Diffusion models have rapidly become mainstream technologies for generating images, videos, audio, and 3D content due to their high-quality output [2]. - The core mechanism of diffusion models involves a "denoising" process that iteratively refines a random image into a clear one, which, while ensuring quality, leads to significant inference delays [2]. Group 2: Acceleration Strategies - Researchers proposed three main acceleration strategies: using ODE solvers to reduce iteration steps, model distillation to compress multi-step processes, and parallel computing to speed up inference [3]. - Each method has limitations, such as quality loss with fewer iterations, high costs of retraining models, and underutilization of parallelism in low-step scenarios [3]. Group 3: EPD-Solver Innovation - The EPD-Solver combines the advantages of the aforementioned strategies, utilizing a numerical solver framework, lightweight distillation for a small set of learnable parameters, and parallel computation of gradients [3][4]. - This method effectively reduces numerical integration errors without significant modifications to the model or additional latency, achieving high-quality image generation with only 3-5 sampling steps [3][4]. Group 4: Performance and Results - EPD-Solver can be integrated as a "plugin" into existing solvers, significantly enhancing their generation quality and efficiency [4]. - Experimental results show that EPD-Solver outperforms baseline solvers in various benchmarks like CIFAR-10, FFHQ, and ImageNet, demonstrating its potential in low-latency, high-quality generation tasks [21][25]. Group 5: Key Advantages - The method offers parallel efficiency and precision improvements by introducing multiple gradient evaluations, which significantly enhance ODE integration accuracy while maintaining zero additional inference delay [28]. - EPD-Solver is lightweight and can be easily integrated into existing ODE samplers, avoiding the costly retraining of diffusion models [28].
刚刚!DeepSeek,硬核发布!
券商中国· 2025-02-27 03:35
Core Viewpoint - DeepSeek has made significant advancements in optimizing parallel computing strategies and has introduced new models that enhance performance and reduce costs in AI applications [2][3][5][7]. Group 1: Optimized Parallelism Strategies - DeepSeek announced the release of Optimized Parallelism Strategies aimed at improving computational efficiency, reducing resource waste, and maximizing system performance through effective task allocation and resource coordination [3][5]. - The strategies are designed for high-performance parallel execution in multi-core, distributed, or heterogeneous systems, balancing computation, communication, and storage overhead [5]. Group 2: New Model Releases - NVIDIA has open-sourced the first optimized DeepSeek-R1 model on the Blackwell architecture, achieving a 25-fold increase in inference speed and a 20-fold reduction in cost per token [3][7]. - The DeepSeek-R1 model's local deployment has garnered significant attention, with its inference throughput reaching 21,088 tokens per second, compared to 844 tokens per second for the H100, marking a substantial performance improvement [7]. Group 3: Cost Reduction Initiatives - DeepSeek announced a significant reduction in API call prices during nighttime hours, with DeepSeek-V3 prices cut to 50% and DeepSeek-R1 to as low as 25%, with reductions up to 75% [6]. Group 4: Additional Open Source Contributions - DeepSeek has continued its open-source initiatives by releasing FlashMLA, DeepEP, and DeepGEMM, which are optimized for NVIDIA GPUs and designed to support various AI model training and inference tasks [9].