结构化稀疏优化

Search documents
只需一次指令微调,大模型变身全能专家天团,8B模型性能反超全微调基线 | ACL25 Oral
量子位· 2025-07-28 06:42
Core Insights - The article discusses the limitations of current methods for upgrading large language models (LLMs) and introduces a new framework called Sparse Interpolation Mixture of Experts (SIMoE) that allows for efficient and effective model adaptation with minimal fine-tuning costs [1][4]. Group 1: Limitations of Current Methods - Existing upgrade methods for LLMs face two main limitations: reliance on manual experience for selecting upgrade locations and lack of a systematic mechanism to balance expert specialization and collaboration [4][7]. - The first limitation involves a static upgrade strategy that ignores the dynamic differences between model layers and task-specific requirements, leading to suboptimal performance [7][8]. - The second limitation is the inefficiency in expert collaboration, where traditional methods either force collaboration among experts or train them independently, resulting in knowledge redundancy and poor generalization [9][10]. Group 2: Introduction of SIMoE - SIMoE offers a novel solution by enabling automatic upgrades of standard LLMs to high-performance sparse expert models through a single-stage fine-tuning process [4][6]. - The framework utilizes structured sparse optimization to identify neuron-level expert parameters, combining shared incremental parameters with orthogonal penalties to achieve dual breakthroughs in performance and efficiency [4][14]. Group 3: Performance Metrics - SIMoE has demonstrated superior performance metrics, with an 8B model outperforming the fully fine-tuned baseline by 1.6% in ROUGE-L scores, a 10% increase in safety metrics, and a 30% reduction in inference memory [6][24]. - In various benchmark tests, SIMoE has shown significant improvements in accuracy across multiple tasks, including a 2.8% increase in zero-shot settings and a 75.02% accuracy in few-shot scenarios [24][27]. Group 4: Innovations in SIMoE - The framework introduces a structured sparse upgrade mechanism that transforms the selection of upgrade locations into a learnable sparse optimization problem, enhancing global optimization capabilities [15][16]. - Additionally, SIMoE incorporates a "non-involution protocol" within expert teams to balance collaboration and specialization, ensuring efficient knowledge transfer and minimizing parameter redundancy [20][22]. Group 5: Experimental Validation - SIMoE has been validated through extensive experiments on both visual and natural language models, showcasing its effectiveness in small sample learning and cross-task generalization [22][25]. - The results indicate that SIMoE consistently outperforms baseline models across various datasets and tasks, reinforcing its potential as a leading framework for LLM adaptation [24][27].