西湖大学提出RDPO强化学习框架,实现扩散模型并行推理加速
量子位·2026-01-13 07:21

Core Viewpoint - The article discusses the transition from diffusion models generating high-resolution images to real-time video generation using world models, highlighting the limitations of the sequential denoising process inherent in diffusion models [1][2]. Group 1: Acceleration Techniques - The RDPO (Residual Dirichlet Policy Optimization) framework proposed by West Lake University's AGI Lab optimizes the sampling navigation system without altering the model itself, aiming to enhance speed while maintaining quality [3][10]. - The Ensemble Parallel Direction Solver (EPD-Solver) reduces sampling delay by integrating multiple parallel gradient evaluations, addressing the high latency issues associated with diffusion models [5][6]. - EPD-Solver employs a two-stage optimization framework, initially optimizing a small set of learnable parameters and then applying RDPO for further enhancement, which effectively mitigates reward hacking phenomena [6][12]. Group 2: Performance Improvements - The RDPO-optimized EPD-Solver significantly improves the generation capabilities of Stable Diffusion v1.5 and SD3-Medium, achieving better quality with fewer steps [7][20]. - The method demonstrates superior performance across various benchmarks, including CIFAR-10, FFHQ, and ImageNet, showcasing its potential for low-latency, high-quality generation tasks [6][20]. Group 3: Methodology Insights - The RDPO framework focuses on optimizing the sampling path rather than modifying the model's extensive parameters, allowing for efficient adjustments in a low-dimensional space [11][13]. - The first phase involves trajectory distillation to learn high-precision sampling paths, ensuring the generated outputs are logically coherent [13]. - The second phase employs residual strategy optimization, allowing reinforcement learning to fine-tune the sampling path without overhauling the model [14][15]. Group 4: Experimental Validation - Quantitative tests indicate that RDPO successfully enhances the EPD-Solver's performance in text-to-image tasks, with results showing improved metrics across various evaluation criteria [22][23]. - The article emphasizes that high-quality generation does not necessarily require extensive computational resources, as clever optimization strategies can yield significant gains at minimal costs [23].

西湖大学提出RDPO强化学习框架,实现扩散模型并行推理加速 - Reportify