AAAI 2026 Oral | 大模型「爱你在心口难开」?深度隐藏认知让推理更可靠
机器之心·2026-01-09 02:53

Core Insights - The article discusses the advancements in large language models (LLMs) in reasoning tasks, particularly emphasizing the Chain-of-Thought (CoT) technique, which enhances model performance by generating intermediate reasoning steps before arriving at a final answer [2][6] - A research team from Hefei University of Technology proposes that LLMs possess a "hidden cognition" that allows them to internally assess the correctness of their reasoning, even if this is not reflected in the token probabilities during generation [2][10] - The paper introduces a framework that enables models to score their reasoning steps based on this hidden cognition, thereby improving the reliability of CoT [2][10] Summary by Sections Introduction - The article highlights the growing application of LLMs in various reasoning tasks and the importance of maintaining stable and reliable reasoning quality throughout the generation process [6][8] - It identifies factors that can affect the reliability of reasoning chains, such as subtle biases in understanding, expression noise, and cumulative errors in long chains [6][8] Research Motivation - The research aims to determine if there are internal signals within the model that can reflect the reliability of current reasoning steps, potentially guiding the model to continue with more reliable paths [7][15] - The study focuses on two key questions regarding the existence of discernible signals in internal activations and the feasibility of constructing a mechanism to utilize these signals [8][15] Methodology and Innovations - The proposed method involves detecting "truth sensitivity" from multiple attention heads and training a simple probe on internal representations to assess which layers are most sensitive to reasoning correctness [10][11] - A confidence predictor is constructed using the most sensitive attention heads to output reliability scores for each reasoning step, based on deep internal representations rather than token probabilities [12][21] - The research introduces a confidence-guided search strategy that combines model generation probabilities with confidence scores to filter the most reliable reasoning paths [13][16] Experimental Results - The study evaluates the effectiveness of the confidence predictor and its application in guiding reasoning paths across various benchmarks, including both single-modal and multi-modal reasoning tasks [22][24] - Results indicate that the proposed method consistently outperforms baseline models, achieving significant improvements in reasoning accuracy across different datasets [23][24] - Ablation studies confirm the critical role of the confidence predictor in enhancing reasoning performance, with random selection of reasoning steps leading to a notable decline in effectiveness [25][27]