连续思维链 - filings, earnings calls, financial reports, news

连续思维链

Search documents

量子位· 2025-06-19 06:25

Core Viewpoint - The article discusses a new research achievement by a team led by AI expert Tian Yuandong, which introduces a continuous thinking chain model that parallels quantum superposition, enhancing efficiency in complex tasks compared to traditional discrete thinking chains [2][4]. Group 1: Research Findings - Traditional large language models (LLMs) utilize discrete tokens for reasoning, which can be inefficient for complex tasks, requiring O(n^2) decoding steps and often getting stuck in local optima [4]. - Recent studies indicate that using continuous hidden vectors for reasoning can significantly improve performance, although theoretical explanations were previously lacking [5]. - The team demonstrated that a two-layer Transformer with D-step continuous chains of thought (CoTs) can solve directed graph reachability problems, outperforming discrete CoTs models that require O(n^2) decoding steps [7]. Group 2: Methodology - The continuous thinking chain allows for simultaneous encoding of multiple candidate graph paths, akin to breadth-first search (BFS), providing a significant advantage over discrete thinking chains, which resemble depth-first search (DFS) [8]. - A designed attention selector mechanism enables the model to focus on specific positions based on the current token, ensuring effective information extraction [11][12]. - The first layer of the Transformer organizes edge information, while the second layer facilitates parallel exploration of all possible paths [21][22]. Group 3: Experimental Results - The team conducted experiments using a subset of the ProsQA dataset, which required 3-4 reasoning steps to solve, with each node represented as a dedicated token [26]. - The COCONUT model, utilizing a two-layer Transformer, achieved an accuracy close to 100% in solving ProsQA problems, while a 12-layer discrete CoT model only reached 83% accuracy, and a baseline model solved approximately 75% of tasks [27][28]. - The model's behavior was further validated through analysis of attention patterns and continuous thinking representations, supporting the theoretical hypothesis of superposition search behavior [30].