o1之后下一个范式？隐式CoT大突破，让推理不再「碎碎念」

Core Viewpoint - The article introduces SIM-CoT (Supervised Implicit Chain-of-Thought), a new advancement in implicit reasoning that addresses the core issue of latent state collapse when scaling implicit tokens, leading to a loss of reasoning semantics [2][9]. Group 1: SIM-CoT Overview - SIM-CoT employs a plug-and-play step-level supervision module that stabilizes optimization and prevents collapse by aligning each latent token with corresponding reasoning steps during training [2][10]. - The method allows for interpretable implicit reasoning, enabling the decoding of latent tokens into human-readable intermediate reasoning steps [2][10]. Group 2: Performance Improvements - During inference, SIM-CoT incurs zero additional overhead, yet it shows significant performance improvements: +2.1% over supervised CoT and +8.2% over Coconut on GPT-2, with stable gains of +1.5% to +9.0% on larger LLaMA models [3][18]. - In the GSM8k-Aug dataset, SIM-CoT improved accuracy from 36.6% to 44.8% (+8.2) while maintaining lower token usage, achieving 2.3× token efficiency [18]. - On out-of-domain datasets like GSM-Hard, MultiArith, and SVAMP, SIM-CoT's average accuracy increased from 42.6% to 46.9% (+4.3), demonstrating robust latent space reasoning [19]. Group 3: Stability and Efficiency - SIM-CoT maintains stability even with increased implicit tokens, addressing issues like latent instability and semantic homogenization that typically arise in implicit CoT methods [9][14]. - The auxiliary decoder used during training is removed during inference, ensuring that SIM-CoT's reasoning efficiency remains comparable to other implicit methods while still providing a speed advantage over explicit CoT [21]. Group 4: Experimental Validation - The authors conducted systematic evaluations of SIM-CoT, confirming that it is more accurate, stable, and token-efficient compared to existing methods [17]. - The framework was validated across various models, including GPT-2 and LLaMA 1B/3B/8B, consistently showing effective performance improvements [22].