Distributed Reinforcement Learning
Search documents
全球闲置算力训个模型,性能媲美R1,老黄天塌了!Karpathy曾投资它
量子位· 2025-05-13 04:45
Core Viewpoint - The article discusses the release of INTELLECT-2, the world's first distributed reinforcement learning (RL) training model, which significantly reduces training costs by utilizing global idle or distributed computing resources [1][2]. Group 1: Model Features - INTELLECT-2 is characterized as the first decentralized RL training model, balancing performance and parameter scale while ensuring data privacy and community-driven development [13][26]. - The model supports web-based interaction, requiring only simple registration for use, and currently accepts text input [11][13]. - The model's performance is comparable to DeepSeek-R1, indicating a potential shift away from centralized computing power dependency [2][10]. Group 2: Training Framework - The training framework, PRIME-RL, allows for decoupled and asynchronous execution of inference data generation and model training, enabling independent progress across different nodes [31][32]. - The system employs a distributed asynchronous RL paradigm, where heterogeneous computing resources can participate without mutual interference [28][26]. - Key components of the system include SHARDCAST for parameter distribution, TOPLOC for verifying data integrity, and Protocol Testnet for managing decentralized computing resources [36][45][52]. Group 3: Performance and Results - INTELLECT-2 has shown improvements in mathematical and programming benchmarks compared to its predecessor QwQ-32B, with notable scores in AIME and LiveCodeBench [64][65]. - The model achieved an average bandwidth throughput of approximately 590 Mb/s during the training process, demonstrating effective communication and computation overlap [69]. Group 4: Team and Funding - The team behind INTELLECT-2, Prime Intellect, is based in San Francisco and has received over $20 million in funding, including a recent $15 million investment from notable figures in the AI community [77][78]. - The team has a history of developing distributed training models and aims to expand decentralized training and collaborate with other leading projects in the open-source AI field [81][80].