OpenAI o3 系列
Search documents
被轻视的Rollout过程,是后训练的性能瓶颈,还是RL的ROI突破口?
机器之心· 2025-11-30 01:30
Group 1 - The Rollout process is a significant performance bottleneck in Reinforcement Learning (RL) post-training, consuming over 70% of the training time, and is crucial for improving training efficiency and effectiveness [1][5][6] - Research indicates that Rollout is a major energy consumer in RL post-training, with studies showing it occupies 70% of the time in RL training processes [6][8] - The quality of Rollout trajectories directly impacts the final results of RL training, with poor trajectories leading to local optima and high-quality trajectories enhancing model exploration and reasoning capabilities [8][9] Group 2 - The shift in focus within the LLM field from pre-training scale competition to enhancing post-training capabilities highlights the importance of optimizing the Rollout phase [6][7] - Rollout and Inference share core technological logic but differ in objectives and computational patterns, with Rollout aiming to provide diverse and valuable trajectory samples for training [7][8] - Recent efforts in the industry are exploring ways to improve computational efficiency and the quality of Rollout trajectories to achieve better RL post-training outcomes [9]