Workflow
连续思维链(Coconut)
icon
Search documents
田渊栋与Russell团队联手,证明Transformer能在训练中自然学会叠加推理
机器之心· 2025-10-07 03:57
Core Insights - The article discusses the emergence of a new reasoning paradigm called "Chain of Continuous Thought" (Coconut), which allows large language models (LLMs) to maintain reasoning trajectories in a continuous latent space rather than discrete token space, leading to significant performance improvements [1][2]. Group 1: Continuous Thought Mechanism - The Coconut method enables models to perform reasoning in a superposition state, allowing them to retain multiple potential reasoning paths in parallel, which is more efficient than traditional methods [3][4]. - A key advantage of this approach is that it can effectively solve directed graph reachability problems using a two-layer Transformer with O(n) continuous thought decoding, where n is the number of nodes in the graph [5]. Group 2: Training Dynamics - Recent research by the teams of Tian Yuandong and Stuart Russell has theoretically confirmed that gradient descent training can naturally converge to this structure, demonstrating the emergence of superposition during training [6][8]. - The training dynamics reveal that even with a single demonstration in each training sample, superposition can spontaneously emerge, maintaining bounded index-matching logits, which is crucial for local search capabilities [9][10]. Group 3: Experimental Results - The experimental setup involved a GPT-2 style decoder with two layers of Transformer, trained over 350 epochs, achieving an accuracy of 96.2% on the test set [13][15]. - The model's attention focused on frontier edges during the reasoning generation phase, leading to a stable logit difference that aligns with theoretical predictions [19][20]. Group 4: Prediction Phase - In the prediction phase, the model utilizes two signals: residual carryover and candidate lift, which help in enhancing the logits of the correct candidates [24][27]. - The dynamics of these signals show that they rise rapidly and stabilize within approximately five epochs, ensuring that the correct candidate's logit is maximized [29][30]. Group 5: Summary of Findings - The study systematically analyzes the spontaneous emergence mechanism of superposition states during continuous thought chain training, highlighting that bounded logits facilitate a balance between exploration and exploitation in reasoning [32][33][34].