Workflow
在线强化学习
icon
Search documents
DeepSeek-R2为什么还没发?
猿大侠· 2025-06-27 14:57
Core Viewpoint - The release of DeepSeek-R2 has been delayed due to CEO Liang Wenfeng's dissatisfaction with its performance and a shortage of Nvidia H20 chips, which are critical for its development [1][2][4]. Group 1: Development Timeline - The anticipation for R2 began after the release of the DeepSeek-V3 model in December last year, which was considered a benchmark for cost-performance [5]. - Initial expectations suggested that R2 would be launched in April, following the upgrade of V3 on March 24 [11]. - Despite the release of a paper on inference scaling in April, there has been no official update on R2's launch [12][16]. Group 2: Technical Specifications - R1's training utilized 30,000 H20 chips, 10,000 H800 chips, and 10,000 H100 chips, indicating the significant computational resources required for R2 [3]. - Leaked parameters for R2 suggested it would have 1.2 trillion parameters and utilize 5.2 petabytes of training data, raising questions about its hardware requirements [17]. Group 3: Community Reactions - Following the news of the delay, community responses varied, with some expressing belief that the delay would be worthwhile, while others speculated that R2 might wait for the release of V4 [26][28].
首次!流匹配模型引入GRPO,GenEval几近满分,组合生图能力远超GPT-4o
机器之心· 2025-05-13 07:08
本文由香港中文大学与快手可灵等团队联合完成。第一作者为香港中文大学 MMLab 博士生刘杰,他的研究方向为强化学习和生成模型,曾获 ACL Outstanding Paper Award。 流匹配模型因其坚实的理论基础和在生成高质量图像方面的优异性能,已成为图像生成(Stable Diffusion, Flux)和视频生成(可灵,WanX,Hunyuan) 领域最先进模型的训练方法。然而,这些最先进的模型在处理包含多个物体、属性与关系的复杂场景,以及文本渲染任务时仍存在较大困难。与此同时,在 线强化学习因其高效探索与反馈机制,在语言模型领域取得显著进展,但在图像生成中的应用仍处于初步阶段。 为此,港中文 MMLab、快手可灵、清华大学等团队联合提出 Flow-GRPO,首个将在线强化学习引入 Flow Matching 模型的工作 。在 Flow-GRPO 加 持下,SD3.5 Medium 在 GenEval 基准测试中的准确率 从 63% 提升到 95%,组合式生图能力超越 GPT4o ,这说明 流匹配模型还有很大提升空间 , Flow-GRPO 的成功实践,为未来利用 RL 进一步解锁和增强各类流匹配生 ...