Workflow
思维链(CoT)技术
icon
Search documents
AAAI 2026 Oral | 大模型「爱你在心口难开」?深度隐藏认知让推理更可靠
机器之心· 2026-01-09 02:53
在这一背景下,合肥工业大学的研究团队提出了一个观点: 大模型的内部其实存在一种「隐藏的真伪认知」。这种状态可以形象地理解为「爱你在心口难 开」——模型在内部激活中已隐含对推理正确性的判断,但这种判断却在基于 Token 概率的生成过程中被错误地表达。因此,模型即便「口头说错」,其 内部表征中仍保留着对纠错的可能。 这篇论文的核心,就是让模型学会用这种隐藏认知来给自己的每一步推理「打分」,进而过滤掉错误的推理链,让 CoT 更可靠。该工作已被 AAAI 2026 录用为 Oral 论文。 但从实际使用和研究结果来看,CoT 的表现并非始终稳定。一些任务中可以明显观察到: 那么问题来了: 大模型有没有可能「意识到自己正在犯错」?在 Token 概率不可靠的情况下,是否有其他信号可以指导更可靠的生成? 近年来,大语言模型在算术、逻辑、多模态理解等任务上之所以取得显著进展,很大程度上依赖于思维链(CoT)技术。所谓 CoT,就是让模型在给出最终 答案前,先生成一系列类似「解题步骤」的中间推理。 这种方式可以显著提高模型在复杂推理类任务上的表现,已成为当前最主流的推理增强方法。 因此,一个自然且重要的问题是: 研究背景 ...
为大模型思考装上“猎鹰重装引擎” :腾讯混元 SEAT 重塑深度思考
AI科技大本营· 2025-07-15 11:30
Core Viewpoint - Tencent's Hunyuan team has introduced the SEAT adaptive parallel reasoning framework, transforming complex reasoning tasks from a "single-engine airship" into a "multi-engine rocket," enhancing the capabilities of large models in handling intricate reasoning challenges [7][44]. Group 1: SEAT Framework Overview - The SEAT framework integrates both sequential and parallel scaling paradigms, allowing for extensive exploration and deep refinement of reasoning processes [15][43]. - It employs a multi-round parallel reasoning approach, significantly enhancing the model's exploration capabilities by generating multiple independent reasoning paths simultaneously [16][20]. - The framework is designed to be plug-and-play, enabling easy integration with existing large language models without requiring additional training [29][44]. Group 2: Performance Enhancements - Initial experiments show that even with a minimal parallel setup (N=2), the SEAT framework can achieve a remarkable accuracy improvement of +14.1% for a 32B model and +24.5% for a 7B model [28]. - As the number of parallel paths increases (up to N=8), performance continues to improve, demonstrating the framework's powerful exploration capabilities [23]. Group 3: Semantic Entropy as Navigation - The SEAT framework introduces semantic entropy as a self-supervised metric to gauge the consistency of reasoning outputs, acting as a "navigation sensor" to determine when to stop computations [27][32]. - Two navigation strategies are implemented: a predefined threshold approach and an adaptive threshold-free mechanism, both aimed at optimizing the reasoning process [35][36]. Group 4: Safety Mechanisms - The SEAT framework includes a safety mechanism to prevent "semantic entropy collapse," which can lead to overconfidence and erroneous outputs in smaller models [38][40]. - By monitoring semantic entropy, the framework can issue stop commands before the model's performance deteriorates, ensuring stable reasoning outcomes [40][44].
只用2700万参数,这个推理模型超越了DeepSeek和Claude
机器之心· 2025-06-30 10:23
Core Insights - The article discusses the need for transformation in the architecture of large language models (LLMs), particularly focusing on the limitations of current chain-of-thought (CoT) techniques, which face challenges such as task complexity, high data requirements, and latency issues [2][4]. Group 1: Hierarchical Reasoning Model (HRM) - The Hierarchical Reasoning Model (HRM) is introduced as a novel cyclic architecture inspired by the human brain's layered and multi-timescale processing mechanisms, achieving high computational depth while maintaining training stability and efficiency [3][6]. - HRM operates through two interdependent cyclic modules: a high-level module for slow, abstract planning and a low-level module for fast, detailed computations, achieving remarkable performance on complex reasoning tasks with only 27 million parameters and 1,000 training samples [4][5]. - HRM does not require pre-training or CoT data, yet it performs nearly perfectly on challenging tasks such as complex Sudoku puzzles and optimal pathfinding in large mazes, outperforming larger models with longer context windows [5][6]. Group 2: Design and Mechanisms - The core design of HRM is based on hierarchical processing and time-scale separation, where high-level brain regions integrate information over longer time scales while low-level regions handle immediate sensory information [12][13]. - HRM incorporates feedback loops similar to the brain's dense recurrent neural network connections, enhancing representation accuracy and contextual adaptability while avoiding issues related to backpropagation through time (BPTT) [14][19]. - The model introduces approximate gradients and deep supervision, allowing for efficient memory usage and improved training dynamics, which contrasts with traditional methods that require extensive memory and time [20][23]. Group 3: Performance and Adaptability - HRM demonstrates hierarchical convergence, with the high-level module stabilizing while the low-level module converges repeatedly, leading to rapid convergence and minimal residuals compared to deep neural networks [17][36]. - The model features adaptive computation time (ACT), enabling it to dynamically adjust computational resources based on task complexity, thus optimizing performance without significant resource expenditure [25][27]. - HRM can seamlessly extend inference computation by adjusting parameters without the need for retraining or architectural changes, showcasing its flexibility in handling complex reasoning tasks [28][36]. Group 4: Experimental Results - Experimental results indicate that HRM excels in complex reasoning tasks, raising questions about the underlying reasoning algorithms it employs, which is crucial for enhancing model interpretability [31][39]. - Visualizations of HRM's reasoning processes reveal its strategies in maze and Sudoku tasks, demonstrating a combination of exploration and optimization techniques that resemble depth-first search methods [31][38]. - The hierarchical structure of HRM emerges as a natural characteristic during the learning of complex reasoning tasks, rather than being an inherent property of the model architecture [34].