GARDO框架
Search documents
拒绝Reward Hacking!港科联合快手可灵提出高效强化学习后训练扩散模型新范式
机器之心· 2026-01-25 02:35
Core Insights - The article discusses the challenges of using Reinforcement Learning (RL) to fine-tune diffusion models like Stable Diffusion, particularly the issue of "Reward Hacking" which can degrade image quality [2][5] - A new framework called GARDO (Gated and Adaptive Regularization with Diversity-aware Optimization) is introduced, which aims to prevent Reward Hacking while enhancing sample exploration and diversity generation [2][12] Background and Motivation - RL has shown promising results in visual tasks, but defining an ideal reward function is challenging, often leading to the use of proxy rewards that can result in Reward Hacking [5][4] - The article highlights the pitfalls of RL post-training, including low sample efficiency and hindered exploration due to static reference models [9][10] GARDO Framework - GARDO is designed to address the issues of Reward Hacking by implementing three core insights: 1. Gated KL Mechanism, which applies KL regularization only when the model generates samples in unreliable reward regions [14][15] 2. Adaptive Regularization Target, which updates the reference model to prevent optimization stagnation [17] 3. Diversity-Aware Advantage Shaping, which encourages diversity in generated samples to avoid mode collapse [18][19] Experimental Results - GARDO has been tested on various base models (SD3.5-Medium, Flux.1-dev) and demonstrated significant advantages over baseline methods like Flow-GRPO [20][21] - The framework effectively prevents Reward Hacking while maintaining high image quality and sample efficiency, achieving better performance with fewer training steps [22][23] Emergent Behavior - GARDO has shown the ability to generate a higher number of objects in challenging tasks, indicating its potential to unlock new capabilities in visual generation [24][25] Conclusion - The work emphasizes that precise control is more important than strict constraints in visual generation using RL, making GARDO a valuable framework for researchers and developers looking to leverage RL in diffusion models [27]