EMNLP 2025 | 动态压缩CoT推理新方法LightThinker来了

Core Viewpoint - The article discusses the development of LightThinker, a model that enhances the efficiency of large language models (LLMs) by compressing reasoning steps, thereby reducing memory usage and computational costs while maintaining accuracy [6][27]. Group 1: LightThinker Overview - LightThinker mimics human cognitive processes by dynamically compressing lengthy reasoning steps into concise representations, significantly reducing the number of tokens stored in the context window [6][27]. - The model's approach involves a cycle of generating, compressing, and discarding information, which helps maintain a small context size and addresses issues of memory overload and slow computation [14][27]. Group 2: Methodology - The first step in LightThinker's methodology is data reconstruction, where training data is modified to include "compression instructions," guiding the model on when to compress information [10]. - The second step involves attention modification, using a technique called "Thought-based Attention Mask" to control what the model can access during reasoning, ensuring it focuses on essential information [12]. - The third step is dynamic reasoning, where the model learns to rely on compact summaries for coherent reasoning rather than lengthy original thoughts [14][17]. Group 3: Experimental Results - LightThinker was tested across four datasets and two different models, showing significant improvements in peak memory usage and reasoning time, with a 70% reduction in peak memory and a 26% decrease in reasoning time while maintaining accuracy [21][27]. - The results indicate that LightThinker achieves a balance between accuracy and efficiency compared to traditional models [24][27]. Group 4: Limitations - The current method has limitations in mathematical tasks due to its data reconstruction approach, which relies on rules rather than semantic understanding, leading to potential information loss during compression [33].