流匹配模型 - filings, earnings calls, financial reports, news

流匹配模型

Search documents

加速近5倍！北大与字节团队提出BranchGRPO，用「树形分叉 + 剪枝」重塑扩散模型对齐

机器之心· 2025-09-22 07:26

快分叉与稳收敛在扩散 / 流匹配模型的人类偏好对齐中，实现高效采样与稳定优化的统一，一直是一个重大挑战。近期，北京大学与字节团队提出了名为 BranchGRPO 的新型树形强化学习方法。不同于顺序展开的 DanceGRPO，BranchGRPO 通过在扩散反演过程中引入分叉（branching）与剪枝（pruning），让多个轨迹共享前缀、在中间步骤分裂，并通过逐层奖励融合实现稠密反馈。该方法在 HPDv2.1 图像对齐与 WanX-1.3B 视频生成上均取得了优异表现。最令人瞩目的是，BranchGRPO 在保证对齐效果更优的同时，迭代时间最高近 5×（Mix 变体 148s vs 698s）。 https://fredreic1849.github.io/BranchGRPO-Webpage/ 代码链接: https://github.com/Fredreic1849/BranchGRPO 研究背景与挑战近年来，扩散模型与流匹配模型凭借在图像与视频生成上的高保真、多样性与可控性，已成为视觉生成的主流方案。然而，仅靠大规模预训练并不能保证与人类意图完全对齐：模型生成的结果常常偏离美学、语义或时间 ...

首次！流匹配模型引入GRPO，GenEval几近满分，组合生图能力远超GPT-4o

机器之心· 2025-05-13 07:08

Core Viewpoint - The article discusses the introduction of Flow-GRPO, the first algorithm to integrate online reinforcement learning into flow matching models, significantly enhancing their performance in image and video generation tasks [2][22]. Group 1: Introduction and Background - Flow matching models have a solid theoretical foundation and excel in generating high-quality images and videos, but they struggle with complex scenes involving multiple objects and relationships [1]. - Online reinforcement learning has made significant strides in language models but remains in its early stages in image generation applications [1]. Group 2: Flow-GRPO Overview - Flow-GRPO combines online reinforcement learning with flow matching models, achieving a remarkable accuracy increase from 63% to 95% in the GenEval benchmark for SD3.5 Medium [2][14]. - The successful implementation of Flow-GRPO opens new avenues for enhancing various flow matching generation models in terms of controllability, composability, and reasoning capabilities [2][22]. Group 3: Key Strategies of Flow-GRPO - The core of Flow-GRPO lies in two key strategies: 1. ODE-SDE equivalence transformation, which allows for effective exploration in reinforcement learning without altering the fundamental characteristics of the model [6][8]. 2. Denoising reduction, which accelerates data collection by reducing the number of denoising steps during training while maintaining high-quality outputs during inference [12][22]. Group 4: Experimental Results - Flow-GRPO demonstrates exceptional performance in various text-to-image generation tasks, significantly improving complex combination generation capabilities and achieving near-perfect results in object counting, spatial relationship understanding, and attribute binding [14][19]. - The accuracy of visual text rendering improved from 59% to 92%, showcasing the model's ability to accurately render text within images [19][21]. - Flow-GRPO also shows significant progress in human preference alignment tasks, effectively reducing reward hacking issues while maintaining image quality and diversity [21][22]. Group 5: Conclusion and Future Outlook - Flow-GRPO reveals a viable path for continuously enhancing flow matching generation model performance through online reinforcement learning [22]. - The successful application of Flow-GRPO suggests promising potential for future advancements in controllability, composability, and reasoning capabilities across multi-modal generation tasks, including images, videos, and 3D content [22].

流匹配模型

在线强化学习

Artificial Intelligence

Artificial Intelligence

Flow-GRPO

SD3.5 Medium

GPT4o