Workflow
精细化Token管理
icon
Search documents
1.5B刷新数学代码SOTA!快手&清华精细化Token管理,LLM推理能力飙升
量子位· 2025-07-30 09:44
Core Insights - The article discusses a new approach called Archer, developed by a team from Kuaishou and Tsinghua University, which utilizes a small model with 1.5 billion parameters to outperform larger state-of-the-art (SOTA) models in various reasoning benchmarks [1][3][18]. Group 1: Methodology - Archer's success is attributed to the refined management of the model's learning process, allowing it to retain essential knowledge while being flexible in reasoning [2][21]. - The method employs a "dual-token constraint" strategy, where tokens are not split but are given customized training rules based on their characteristics [10][11]. - Tokens are categorized into low-entropy (knowledge-based) and high-entropy (reasoning-based) types, with different training constraints applied to each [17][21]. Group 2: Performance Metrics - In mathematical reasoning tasks, Archer achieved significant improvements, with an 18.1% increase in accuracy on AIME24 and a 10.3% increase on AIME25 compared to the original model [18]. - Archer surpassed existing SOTA methods like DAPO, solving 6.6% more problems on AIME24 and 5.2% more on AIME25 [18]. - For code generation, Archer also showed a 3.4% accuracy improvement on LiveCodeBench v5 and a 2.6% improvement on v6 compared to DAPO [19]. Group 3: Efficiency - Archer's training efficiency is notable, requiring only 1900 H800 GPU hours, significantly less than the 16000 H100 hours needed by Nemotron, demonstrating a cost-effective approach to achieving high performance [20]. Group 4: Key Insights - The core insight of Archer is the balance between knowledge stability and reasoning exploration, which is crucial for enhancing the model's capabilities [21][24]. - Experimental validation indicates that without proper constraints on low-entropy tokens, the model's knowledge can deteriorate, while excessive constraints on high-entropy tokens can hinder reasoning flexibility [24].