分布式RL - filings, earnings calls, financial reports, news

分布式RL

Search documents

3 6 Ke· 2025-12-10 06:55

Core Insights - Prime Intellect has launched INTELLECT-3, a 106 billion parameter Mixture-of-Experts model that outperforms other models of similar size in various benchmarks, including mathematics, code, science, and reasoning [1][2] - The company aims to democratize access to advanced reinforcement learning (RL) technologies by open-sourcing the entire training process, including model weights, training frameworks, datasets, RL environments, and evaluation systems [1][2] Model Performance - INTELLECT-3 achieved state-of-the-art (SOTA) results in multiple benchmarks, surpassing models like GLM-4.5 AIR and DEEPSEEK-R1-0528 [2][3] - Specific benchmark results include INTELLECT-3 scoring 90.8 M in AIME 2024 and 14.6 W in HUMANITY'S LAST EXAM, indicating its superior performance in various tasks [3] Training Framework - The training of INTELLECT-3 utilized the PRIME-RL framework for end-to-end training, which is integrated with the Verifiers environment to support the entire training process from synthetic data generation to evaluation [4][5] - The training system is designed to be fully distributed, addressing speed bottlenecks and enabling large-scale training [7][8] Infrastructure and Environment - The training environment is hosted on the Environments Hub, which provides a modular and scalable approach to building RL environments and evaluation tasks [10] - Prime Intellect has developed a high-throughput, secure code execution system called Prime Sandboxes, which allows for efficient execution of external code in a safe manner [12] Computational Resources - The training was conducted on 64 interconnected nodes with 512 NVIDIA H200 GPUs, focusing on maintaining determinism and synchronization in a distributed system [13][14] - The training process lasted two months and included diverse RL environments covering various categories such as mathematics, code, and software engineering [14] Future Directions - Prime Intellect plans to expand its RL environments, aiming to cover more tasks and improve the quality of community tasks available on the Environments Hub [18] - The company is also focusing on enhancing long-sequence agent capabilities, allowing models to manage context and maintain lightweight external memory for improved RL training [18]

大规模强化学习

分布式RL

Artificial Intelligence

Artificial Intelligence

INTELLECT-3

PRIME-RL

Verifiers