Workflow
Verifiers
icon
Search documents
只用512张H200,106B模型靠分布式RL杀出重围,全网开源
3 6 Ke· 2025-12-10 06:55
Core Insights - Prime Intellect has launched INTELLECT-3, a 106 billion parameter Mixture-of-Experts model that outperforms other models of similar size in various benchmarks, including mathematics, code, science, and reasoning [1][2] - The company aims to democratize access to advanced reinforcement learning (RL) technologies by open-sourcing the entire training process, including model weights, training frameworks, datasets, RL environments, and evaluation systems [1][2] Model Performance - INTELLECT-3 achieved state-of-the-art (SOTA) results in multiple benchmarks, surpassing models like GLM-4.5 AIR and DEEPSEEK-R1-0528 [2][3] - Specific benchmark results include INTELLECT-3 scoring 90.8 M in AIME 2024 and 14.6 W in HUMANITY'S LAST EXAM, indicating its superior performance in various tasks [3] Training Framework - The training of INTELLECT-3 utilized the PRIME-RL framework for end-to-end training, which is integrated with the Verifiers environment to support the entire training process from synthetic data generation to evaluation [4][5] - The training system is designed to be fully distributed, addressing speed bottlenecks and enabling large-scale training [7][8] Infrastructure and Environment - The training environment is hosted on the Environments Hub, which provides a modular and scalable approach to building RL environments and evaluation tasks [10] - Prime Intellect has developed a high-throughput, secure code execution system called Prime Sandboxes, which allows for efficient execution of external code in a safe manner [12] Computational Resources - The training was conducted on 64 interconnected nodes with 512 NVIDIA H200 GPUs, focusing on maintaining determinism and synchronization in a distributed system [13][14] - The training process lasted two months and included diverse RL environments covering various categories such as mathematics, code, and software engineering [14] Future Directions - Prime Intellect plans to expand its RL environments, aiming to cover more tasks and improve the quality of community tasks available on the Environments Hub [18] - The company is also focusing on enhancing long-sequence agent capabilities, allowing models to manage context and maintain lightweight external memory for improved RL training [18]
RL Environments at Scale – Will Brown, Prime Intellect
AI Engineer· 2025-12-09 15:53
[music] Today we're talking about RL environments and how to scale them. But the title is a little bit of a red herring. We'll talk a bit about the engineering pieces and like running these with thousands of parallel rollouts and sandboxes on hundreds of GPUs, but I'm mostly going to focus on a different notion of scale. Uh, and what I mean by scaling here is we there's a number of different ways we talk about scaling in the context of AI and research. We know about scaling laws and we talk about how much d ...