NeurIPS 2025 | 中科大、港中深、通义千问联合发布CoRT:仅30个样本教会大模型高效推理,token消耗降低50%
机器之心·2025-11-12 13:23

Core Insights - The article discusses the advancements in large reasoning models (LRMs) like OpenAI-o1, Qwen3, and DeepSeek-R1, which excel in complex reasoning tasks but struggle with precise mathematical calculations [2] - A new framework called CoRT (Code-Optimized Reasoning Training) is introduced, aimed at enhancing the efficiency of large language models by teaching them to effectively utilize code tools for reasoning [3][8] Group 1: Challenges in Current Models - Current models face cognitive conflicts between probabilistic reasoning and deterministic knowledge from external tools, leading to inefficiencies [4] - Models often engage in lengthy natural language reasoning before verifying results with code, resulting in delayed calculations and unnecessary distrust in code outputs [4] - There is a scarcity of high-quality training data for the new "model-tool" collaborative reasoning paradigm, posing a significant challenge [4] Group 2: CoRT Framework Overview - CoRT aims to reshape the interaction between models and tools, transitioning from inefficient verification to efficient computation [8][16] - The framework employs a three-step approach: data cold start, intelligent agent tuning, and advanced training processes [8] Group 3: Hint-Engineering Strategy - Hint-Engineering is introduced as a novel data synthesis strategy to generate high-quality interaction data, correcting inefficient model behaviors at critical decision points [9] - By strategically injecting guiding prompts, the model can be directed to simplify reasoning through code, enhancing efficiency [10][11] Group 4: Multi-Stage Training Process - CoRT incorporates a comprehensive training pipeline consisting of Supervised Fine-Tuning (SFT), Reject Sampling Fine-Tuning (RFT), and Reinforcement Learning (RL) [13] - Initial fine-tuning with high-quality samples allows the model to learn efficient interaction patterns, while RFT filters out poor trajectories to reinforce good behaviors [13] - The RL component enables the model to autonomously learn optimal tool usage strategies through interaction with the code interpreter [13] Group 5: Performance and Efficiency Gains - CoRT has been evaluated on five challenging mathematical reasoning benchmarks, demonstrating significant performance improvements [14] - The framework achieved a 4% absolute accuracy increase for the DeepSeek-R1-32B model and up to an 8% increase for the 1.5B model, outperforming many data-intensive models [20] - Token consumption was reduced by approximately 30% for the 32B model and an impressive 50% for the 1.5B model compared to baseline models [20] Group 6: Implications and Future Directions - The introduction of CoRT provides a new pathway for addressing the shortcomings of large language models in precise reasoning tasks, showcasing the potential for more powerful and reliable AI systems [16][17] - Future research will focus on expanding the framework to incorporate a wider variety of tools and more complex task scenarios [17]