01年实习生被曝负责字节RL核心算法!系字节LLM攻坚小组成员
量子位·2025-03-20 10:56

Group 1 - The core algorithm DAPO, developed by ByteDance and Tsinghua AIR's SIA Lab, surpasses the performance of DeepSeek GRPO, achieving a score of 50 on the AIME 2024 benchmark with a 50% reduction in training steps [1][2]. - Qiying Yu, a key figure in the development of DAPO, is a PhD student at Tsinghua University and was part of ByteDance's Top Seed talent program [3][4][30]. - Yu's research focuses on enhancing reasoning capabilities in large language models, demonstrating significant improvements in mathematical abilities through reinforcement learning (RL) [14][16]. Group 2 - The Top Seed program at ByteDance aims to nurture young talent in AI research, providing resources and support for innovative projects [47][58]. - The industry is witnessing a shift where problem-solving ability is prioritized over experience, allowing younger researchers to make significant contributions [50][53]. - The exploration of AGI (Artificial General Intelligence) is increasingly open to newcomers, as traditional methodologies may hinder innovation [54][55]. Group 3 - ByteDance plans to continue the Top Seed project, led by former Google DeepMind VP Wu Yonghui, indicating ongoing investment in AI research and talent development [61][62].