Workflow
Environments Hub
icon
Search documents
只用512张H200,106B模型靠分布式RL杀出重围,全网开源
3 6 Ke· 2025-12-10 06:55
【导读】Prime Intellect发布的INTELLECT-3,在数学、代码等多项基准测试中取得同规模最强表现。该模型旨在将训练前沿模型的技术栈开放给社区, 推动大规模RL研究的普及与发展。 Prime Intellect已经把完整的训练流程——包括模型权重、训练框架、数据集、RL环境和评测体系——全部开源,希望能推动更多关于大规模强化学习的 开放研究。 INTELLECT-3使用的训练软件与基础设施,与即将在Prime Intellect平台向所有人开放的版本完全一致。 这意味着未来每个人、每家公司都能拥有对最先进模型进行后训练的能力。 多项基准,斩获SOTA INTELLECT-3是一个106B参数的Mixture-of-Experts(MoE)模型,基于GLM 4.5 Air进行了监督微调(SFT)和强化学习训练。 最近,Prime Intellect正式发布了INTELLECT-3。 这是一款拥有106B参数的混合专家(Mixture-of-Experts)模型,基于Prime Intellect的强化学习(RL)技术栈训练。 在数学、代码、科学与推理的各类基准测试上,它达成了同规模中最强的成绩, ...
RL Environments at Scale – Will Brown, Prime Intellect
AI Engineer· 2025-12-09 15:53
[music] Today we're talking about RL environments and how to scale them. But the title is a little bit of a red herring. We'll talk a bit about the engineering pieces and like running these with thousands of parallel rollouts and sandboxes on hundreds of GPUs, but I'm mostly going to focus on a different notion of scale. Uh, and what I mean by scaling here is we there's a number of different ways we talk about scaling in the context of AI and research. We know about scaling laws and we talk about how much d ...