Test Time Scaling(TTS)技术
Search documents
告别无效计算!新TTS框架拯救19%被埋没答案,推理准确率飙升
机器之心· 2025-09-02 06:32
Core Insights - The article discusses the development of the Stepwise Reasoning Checkpoint Analysis (SRCA) framework, which enhances the reasoning capabilities of large language models (LLMs) through improved test-time scaling methods [2][3][25]. Group 1: SRCA Framework - The SRCA framework addresses two main issues in existing test-time scaling methods: path homogeneity and underutilization of intermediate results [2][6]. - SRCA integrates two core strategies: Answer-Clustered Search (ACS) to maintain path diversity and Checkpoint Candidate Augmentation (CCA) to utilize all intermediate answers for final decision-making [2][10][19]. Group 2: Methodology - Checkpoint Injection is a foundational technique in SRCA, which forces the model to pause after each reasoning step to output intermediate answers [10][12]. - ACS prevents path homogeneity by grouping similar checkpoint answers and ensuring that diverse reasoning paths are explored [14][17]. - CCA enhances the model's accuracy by salvaging intermediate answers that may have been discarded during the reasoning process, thus improving resource utilization [19][20]. Group 3: Experimental Results - The SRCA framework enabled a 1B parameter model to achieve a 65.2% accuracy on the MATH500 dataset, surpassing a 70B model's accuracy of 65.0% [25]. - SRCA requires only 16 samples to achieve the accuracy of other TTS methods that need 128 samples, resulting in an 8-fold increase in reasoning efficiency [25]. - CCA successfully rescued 19.07% of correct answers from intermediate steps that were previously discarded due to subsequent path deviations [25].