大模型推理优化
Search documents
无需训练、只优化解码策略,DTS框架让大模型推理准确率提升6%,推理长度缩短23%
机器之心· 2025-11-21 02:04
Core Insights - The article discusses the advancements in Large Reasoning Models (LRMs) and introduces DTS (Decoding Tree Sketching), a new inference framework that addresses the issue of "overthinking" in models, which leads to longer and often incorrect reasoning paths [2][8][26]. Group 1: Problem Identification - The "overthinking" problem in reasoning models results in longer reasoning chains that are more prone to errors and self-repetition, decreasing accuracy [8][11]. - Existing methods to mitigate this issue often rely on additional training or aggressive pruning, which can be costly and unstable [8][11]. Group 2: DTS Framework - DTS employs two key strategies: high uncertainty branching reasoning and early stopping upon the first completion of a path, aiming to approximate the shortest and correct reasoning path [2][8][26]. - The framework does not require additional training or modifications to model weights, making it a plug-and-play solution [8][26]. Group 3: Empirical Results - In AIME2024/2025, DTS achieved an average accuracy improvement of 6% and a reduction in average reasoning length by approximately 23%, along with a 10% decrease in endless repetition rates [4][20]. - The empirical findings indicate a significant negative correlation between reasoning chain length and accuracy, with shorter reasoning chains often yielding higher correctness rates [9][11]. Group 4: Methodology - The reasoning process is conceptualized as a decoding tree, where nodes represent generated tokens and paths represent complete chains of thought (CoT) [12][13]. - DTS focuses on branching only at "key tokens" where uncertainty is high, thereby avoiding unnecessary complexity in the decoding tree [15][16]. Group 5: Conclusion and Future Directions - DTS provides a lightweight optimization route for reasoning models, allowing them to "think less but more accurately" [26][27]. - The approach is expected to integrate with multi-step reasoning, calibration, and uncertainty estimation, paving the way for more efficient and reliable reasoning in LRMs [27].
英伟达帮你省钱,让大模型推理「短而精」,速度快5倍
机器之心· 2025-11-04 04:22
Core Insights - The article discusses the challenges and advancements in reasoning models, particularly focusing on the balance between reasoning length and accuracy [2][3] - It highlights the introduction of DLER, a new reinforcement learning method that significantly reduces reasoning length while maintaining accuracy [7][10] Group 1: DLER Methodology - DLER addresses the issues arising from length penalties in reinforcement learning training, proposing a simple yet effective training recipe [7] - The DLER model achieves a reduction in reasoning length by over 70% while keeping accuracy intact, with DLER-Qwen-R1-7B using an average of 3230 tokens to reach 55.6% accuracy on the AIME-24 benchmark [7][10] Group 2: Key Findings - DLER is effective not only for small models but also for large models, introducing magnitude-selective weight merging to mitigate performance drops during fine-tuning [12] - The research indicates that improving reasoning efficiency relies more on the choice of optimization algorithms rather than the complexity of penalty designs [15] Group 3: Future Implications - The findings suggest a shift in the approach to reasoning models, emphasizing smarter and more efficient thinking rather than merely extending reasoning chains [14] - DLER is positioned as a critical technology for the practical deployment of reasoning models, enhancing their speed and utility [14]