思考早停
Search documents
腾讯发布SpecExit算法,无损压缩端到端加速2.5倍!解决大模型长思考效率难题
机器之心· 2025-10-24 03:40
Core Insights - The article discusses the introduction of the SpecExit method, which integrates early stopping and speculative sampling to enhance the efficiency of Large Reasoning Models (LRMs) by reducing reasoning chain length by 66% and achieving a 2.5x end-to-end acceleration on vLLM [2][9][28]. Group 1: Challenges and Innovations - The challenges of early stopping in reasoning models include high training costs and potential reliability issues with training-based methods, while training-free methods often incur additional computational overhead [5][10]. - SpecExit leverages the natural advantages of speculative sampling to ensure consistent model outputs while extracting reasoning progress signals from the draft model's hidden states [9][10]. - The SpecExit framework allows for dynamic and reliable early stopping without introducing extra detection costs, achieving significant acceleration compared to baseline methods [9][22]. Group 2: SpecExit Methodology - The SpecExit training process involves constructing data from the model's complete outputs, labeling signals such as confidence, remaining reasoning length, and reasoning progress, and employing multi-task learning to optimize these signals alongside token classification [13][14][15]. - The method utilizes an exponential weighted moving average to smooth the signals, ensuring robust early stopping decisions during the decoding phase [19][21]. Group 3: Experimental Results - Evaluations on various benchmarks show that SpecExit significantly reduces reasoning lengths, with reductions of 54% and 53% on the GSM8K and ARC-Challenge datasets, respectively, while maintaining accuracy [23][24]. - Compared to other early stopping methods, SpecExit not only shortens reasoning lengths but also provides substantial improvements in inference speed, making it more practical for real-world applications [25][28]. Group 4: Conclusion - SpecExit demonstrates high generalization capabilities across diverse tasks and models, revealing the potential of hidden states as efficient reasoning information signals, which may guide future research in this area [28].