字节跳动：2025年思考模型Seed-Thinking-v1.5技术报告

Core Insights - ByteDance has introduced Seed1.5-Thinking, a state-of-the-art reasoning model with 20 billion activated parameters and a total of 200 billion parameters, demonstrating exceptional reasoning capabilities across various benchmarks [1][5][60] - The model achieved scores of 86.7 on AIME 2024, 55.0 on Codeforces, and 77.3 on GPQA, showcasing its strengths in STEM and coding tasks while also exhibiting strong generalization abilities in non-reasoning tasks [1][5][49] Model Performance - Seed1.5-Thinking matches OpenAI's o3-mini-high in AIME 2024 but still lags behind in AIME 2025 and BeyondAIME challenges [2][49] - In the GPQA task, Seed1.5-Thinking's performance is close to o3-level, achieving a score of 77.3% [49] - The model outperforms DeepSeek R1 by 8% in overall user preference in non-reasoning tasks, indicating its broader applicability [1][5][51] Development Aspects - The development of Seed1.5-Thinking focuses on three key areas: training data, reinforcement learning (RL) algorithms, and RL infrastructure [10][12][60] - The training data includes a mix of STEM problems, coding tasks, and logic reasoning, with a strong emphasis on chain-of-thought data for supervised fine-tuning [10][15][23] - The RL training employs innovative frameworks like VAPO and DAPO to address instability issues, ensuring robust training trajectories [12][10] Infrastructure and Efficiency - The model utilizes a hybrid engine architecture and a Streaming Rollout System (SRS) to enhance training efficiency and scalability [2][42][44] - The SRS architecture allows for dynamic adjustments in sample ratios and optimizes memory usage, significantly improving training speed [43][44] Future Directions - The team plans to explore more efficient RL methods and tackle more complex tasks, aiming to push the boundaries of the model's intelligence [2][60] - Upcoming releases will include internal benchmarks like BeyondAIME and Codeforces to support further research in the field [2][5]