推理模型过度思考 - filings, earnings calls, financial reports, news

推理模型过度思考

Search documents

Sou Hu Cai Jing· 2026-01-22 15:42

我最近观察到一个最先进的推理模型花了17秒来思考一个看似简单的问题：1+1等于多少？当它最终回答"2"时，我并没有感到沮丧，而是被这种现象所揭示的推理模型根本性低效问题深深吸引。这个模型解决基本数学方程的能力并不成问题，我实际上是在测试它区分需要深度推理的查询和需要即时回忆的查询的能力。而这个特定的模型完全按照训练目标执行——在每个回应之前都要思考。先进推理模型代表了AI的前沿技术，能够进行多步骤逻辑推理、细致的问题解决和约束满足。这些模型能够通过"推理"来处理越来越复杂的任务，例如将任务分解成更小的步骤并迭代地构建解决方案。比如，当被要求规划多城市旅行时，推理模型可以将问题分解为子任务——评估交通选择、检查预算约束、优化时间表——然后将这些组件综合成一个连贯的计划。这些模型还可以展现其逐步思考过程，提供它们如何处理问题的可见性——尽管这些解释在多大程度上忠实地代表内部处理过程仍然是一个活跃的研究领域。虽然这些都是强大的工具，但它们经常被不加区别地部署在各种任务中，包括可能根本不需要推理的无数查询——这种低效率带来了实际后果。每个不必要的推理循环都会增加延迟，增加基础设施成本，并消耗能 ...

推理“刹不住车”？新框架让DeepSeek-R1们告别过度思考，已开源

量子位· 2025-06-03 06:21

Core Viewpoint - The article discusses the emergence of a new method called Self-Braking Tuning (SBT) aimed at improving the efficiency of reasoning models by preventing them from overthinking and generating unnecessary outputs [3][4][10]. Group 1: Overthinking in Models - As reasoning models like DeepSeek-R1 and OpenAI o1 become more capable, they tend to overthink, which affects efficiency and can lead to errors due to the accumulation of small mistakes [2][3]. - A key challenge is to enable models to know when to stop reasoning, balancing thoroughness with efficiency [3][10]. Group 2: Self-Braking Tuning (SBT) Mechanism - SBT is a lightweight and universal tuning mechanism that integrates seamlessly into existing large models, focusing on reaching correct answers efficiently [4][5]. - The core design includes a braking signal mechanism and multi-task fine-tuning, allowing models to learn when to terminate reasoning without external intervention [5][6][7]. Group 3: Identifying Overthinking - The research team developed a reference standard answer evaluation system to identify redundant reasoning during the process [16]. - Reasoning is divided into two main phases: Foundation Solution (initial answer) and Evolution Solution (further thoughts and validations) [17][18]. - Two core metrics were proposed: reasoning efficiency ratio and overthinking marking ratio, which assess the redundancy in reasoning from structural and linguistic perspectives [20][21][22]. Group 4: Data Construction Strategies - The team created two complementary data construction strategies: SBT-E (Exact) and SBT-D (Dynamic) [24]. - SBT-E uses a unified truncation strategy to structure reasoning paths, helping models distinguish necessary from unnecessary reasoning [25][27]. - SBT-D employs a stepwise adaptive strategy to dynamically adjust reasoning length based on problem complexity, preventing overthinking [28][29]. Group 5: Self-Regulating Brake Strategy - A self-regulating brake strategy was introduced to enhance the model's self-control during reasoning [31]. - Redundant parts of the reasoning process are masked during training, allowing models to focus on key steps without being penalized for the masked content [32][35]. - Natural language prompts are also used to guide models to recognize when they have enough information, reducing unnecessary output [36][37]. Group 6: Performance Evaluation - Extensive experiments on mathematical reasoning benchmarks (AIME, AMC, MATH500, GSM8K) showed significant performance improvements with the SBT framework, particularly in reasoning efficiency [38][39]. - For instance, the Llama-3.1-8B-Instruct model reduced token generation by 62.8% while maintaining an accuracy of 94.1% after applying the SBT-E strategy [40]. - The method demonstrated robustness and generalizability across various model architectures and scales, confirming its value in eliminating redundant reasoning without compromising understanding [41][42].

推理模型过度思考

推理效率比

过度推理识别指标体系

Self - Braking Tuning (SBT)

Self - Braking Tuning Exact (SBT - E)

Self - Braking Tuning Dynamic (SBT - D)

推理模型过度思考

推理效率比

过度推理识别指标体系

Self - Braking Tuning (SBT)

Self - Braking Tuning Exact (SBT - E)

Self - Braking Tuning Dynamic (SBT - D)