智源开源EditScore:为图像编辑解锁在线强化学习的无限可能
机器之心·2025-10-22 03:30

Core Insights - The article discusses the significant advancements in instruction-guided image editing technology, particularly through the introduction of the EditScore model series by the VectorSpace Lab team at Beijing Academy of Artificial Intelligence [2][3] - EditScore aims to provide precise and reliable reward signals for instruction-guided image editing tasks, addressing the challenges faced by existing models in following complex text instructions [2][5] Development of EditScore - EditScore is a continuation of the OmniGen series, focusing on creating a more general and controllable generative AI [3] - The EditReward-Bench dataset has been released as the first public benchmark specifically designed for evaluating image editing reward models, covering 13 sub-tasks and 11 state-of-the-art editing models [6] - The EditScore model series includes three sizes: 7B, 32B, and 72B, designed to provide high-fidelity feedback signals for instruction-based image editing tasks [7] Performance Metrics - EditScore has demonstrated superior performance on the EditReward-Bench compared to other models, with the largest model achieving accuracy surpassing that of GPT-5 [9] - The evaluation metrics indicate that EditScore models consistently outperform existing visual language models, showcasing their effectiveness in providing accurate quality scores for image editing results [8][9] Applications of EditScore - EditScore serves as an advanced reranker to enhance the output quality of various mainstream editing models through a "Best-of-N" approach [15] - It also functions as a high-fidelity reward signal for reinforcement learning, enabling stable and efficient RL fine-tuning, with notable performance improvements observed in the OmniGen2 model [15][16] Insights from Research - The research highlights that a high score does not necessarily equate to an effective reinforcement learning coach; the distribution of scores from the reward model is also crucial for training effectiveness [16] - A self-ensemble scaling strategy has been identified as a method to enhance performance, suggesting that a well-designed 7B model can outperform larger models in specific tasks [19] Future Directions - The team plans to continue exploring reward modeling and will release additional reinforcement learning training code and inference scripts for the community [3][22] - The ongoing development of EditScore is expected to enhance the controllability and reliability of AIGC models, opening new possibilities for their application across various fields [22]