模型自适应难度分级蒸馏

Search documents
大模型推理上限再突破:「自适应难易度蒸馏」超越R1蒸馏,长CoT语料质量飞升
机器之心· 2025-05-04 04:57
Core Viewpoint - The article discusses the development of a novel method for generating high-quality Chain of Thought (CoT) data, focusing on the adaptive difficulty grading of questions for large language models (LLMs) to enhance the reasoning capabilities of smaller models [2][6][41]. Group 1: Research Motivation and Challenges - The emergence of large models like DeepSeek-R1 (671 billion parameters) has highlighted the challenges of deploying such models in real-time systems and edge devices [6]. - There is a pressing need for research on smaller models with fewer than 7 billion parameters, particularly in complex reasoning tasks such as mathematical problem-solving and code generation [7]. - Current CoT data generation methods face challenges, including high computational and annotation costs associated with large-scale data-driven approaches and limited performance gains from high-quality sample-driven methods [8][9]. Group 2: Proposed Methodology - The article introduces a new method called "LLM Adaptive Question Difficulty Grading," which aims to improve the quality of CoT data by dynamically matching model capabilities with data difficulty [12][13]. - The method includes four key innovations: establishing a question difficulty grading system based on inherent model reasoning capabilities, creating an adaptive question bank, designing a difficulty distribution sampling strategy, and generating high-quality CoT data using DeepSeek-R1 [15][18]. Group 3: Experimental Results - The proposed method has shown significant improvements in reasoning performance across various model sizes, with accuracy increases ranging from 6.66% to 26.7% on the AIME24 mathematics competition dataset compared to traditional non-adaptive strategies [18][20]. - Detailed experimental results indicate that models trained with the adaptive CoT data outperform baseline models in multiple mathematical reasoning benchmarks, achieving up to 94.6% accuracy on MATH500 [37]. - The ZCode-32B model demonstrated superior performance across different difficulty levels, indicating that smaller models can achieve competitive results through adaptive data training [38]. Group 4: Conclusion and Future Work - The article concludes that the proposed framework for generating high-quality CoT data is efficient and effective, requiring only about 2,000 high-quality samples to significantly enhance model performance while reducing data and computational costs [41]. - Future work will focus on further integrating reinforcement learning to explore deeper reasoning capabilities and extending applications to more complex cross-domain tasks such as communication fault diagnosis [42].