重磅开源！首个全异步强化学习训练系统来了，SOTA推理大模型RL训练提速2.77倍

Core Viewpoint - AReaL-boba² is a significant upgrade to the asynchronous reinforcement learning (RL) training system, enhancing efficiency, usability, and performance in coding RL tasks, while fully supporting Agentic RL [2][3][39]. Group 1: Efficiency and Performance - AReaL-boba² achieves a training speed improvement of up to 2.77 times compared to the previous version, while maintaining model performance [8]. - The system has set new state-of-the-art (SOTA) benchmarks in coding tasks, with the AReaL-boba²-14B model scoring 69.1 on LiveCodeBench and achieving a Codeforce rating of 2044 [5][4]. - The asynchronous RL framework allows for continuous data generation and model training, significantly improving GPU resource utilization and reducing idle time [14][15]. Group 2: User Accessibility - The upgrade includes comprehensive tutorials and documentation, making it easier for both beginners and experienced users to customize datasets, algorithms, and agent logic without modifying the underlying code [3][8]. - AReaL-boba² is designed to be user-friendly, with a simplified environment setup and experiment initiation process [3][8]. Group 3: Technical Innovations - The system employs a fully asynchronous RL training approach, decoupling data generation from model training, which addresses inefficiencies found in traditional synchronous RL systems [14][15]. - AReaL-boba² introduces two key algorithmic improvements: Staleness Control to manage data freshness and Decoupled PPO Objective to mitigate distribution discrepancies between old and new model versions [24][28]. Group 4: Future Developments - The AReaL team is continuously updating the Agentic RL capabilities, allowing developers to customize agents and environments for multi-turn interactions [39][40]. - The project is built on years of technical accumulation from various research teams and aims to make AI training accessible and customizable for everyone [41].