Workflow
算力成本大降,马尔可夫思考机来了,LLM推理成本直接降为线性
3 6 Ke·2025-10-10 07:27

Core Insights - The article discusses the effectiveness and high costs of using reinforcement learning to enhance reasoning capabilities in large language models (LLMs) [1] - A new paradigm called the Markovian Thinker is introduced, which aims to limit the computational complexity associated with reasoning in LLMs by maintaining a fixed state size [4][20] Group 1: Markovian Thinker Concept - The core idea of the Markovian Thinker is to reconstruct the components of reinforcement learning so that the effective state size remains bounded regardless of the total thinking length [4] - This approach allows longer reasoning processes to require only linear computational resources and constant memory, decoupling the duration of model thinking from the amount of context it must handle [4][20] Group 2: Delethink Implementation - Delethink is a reinforcement learning environment that organizes the reasoning process into fixed-size chunks, resetting context at the boundaries of these chunks [4][9] - The implementation of Delethink results in linear scaling for both the generation and backpropagation phases, contrasting with the quadratic scaling seen in traditional LongCoT environments [6][15] Group 3: Experimental Results - Experiments show that even with an 8K chunk size, the DeepSeek R1-Distill 1.5B model trained with Delethink can reason up to 24K tokens, outperforming LongCoT-RL in mathematical benchmark tests [9][12] - The model achieved 49% accuracy on a 96K token reasoning task with minimal additional training steps, demonstrating significant efficiency improvements [14][15] Group 4: Implications for Future Models - The success of the Markovian Thinker indicates that decoupling thinking length from context size could enable next-generation reasoning models to handle millions of tokens effectively [20] - The findings suggest that non-quadratic complexity sequence architectures may greatly benefit reasoning models, as the thinking process can be effectively transformed into a Markovian style [20]