LoRA

Search documents
ICML 2025 | CoTo:让LoRA训练「渐入佳境」,模型融合、剪枝样样精通
机器之心· 2025-07-26 12:17
Core Viewpoint - The article introduces CoTo, a progressive training strategy designed to enhance the robustness and effectiveness of Low-Rank Adaptation (LoRA) models, addressing issues such as training instability and performance drop after pruning [1][4][23]. Summary by Sections Conventional LoRA Training Issues - LoRA faces challenges including "lazy training," where optimization gets stuck near suboptimal solutions, limiting generalization [7] - There is a hierarchical imbalance in training, with gradient updates concentrated on top layers, leading to undertraining of lower layers [7] - These issues complicate downstream operations like model fusion and pruning, often resulting in unsatisfactory outcomes [7] CoTo Strategy - CoTo employs a simple yet effective progressive activation strategy, initially deactivating a portion of LoRA adapters to encourage uniform gradient flow across all layers [5][8] - The activation probability of adapters is gradually increased during training, returning to standard fine-tuning mode in later stages [8] Experimental Results - CoTo significantly improves the fusion and pruning capabilities of LoRA models, enhancing single-task generalization performance and training efficiency [12][23] - In linear interpolation tasks, CoTo models maintain smooth performance transitions, unlike standard LoRA, which experiences sharp declines [13] - CoTo outperforms standard LoRA in both structured and unstructured pruning scenarios, demonstrating enhanced fault tolerance [17] Performance and Efficiency Improvements - CoTo consistently boosts performance across various benchmarks, including visual and language tasks, and achieves over 24% training acceleration when applied to HiRA [24][23] Ablation Studies - Rigorous ablation studies validate the design choices of CoTo and provide insights into effective regularization of LoRA [21] Conclusion - CoTo effectively resolves hierarchical imbalance and lazy optimization issues in LoRA training, enhancing model robustness and simplifying downstream operations like fusion and pruning [23]
充分激发模态协作,MokA量身打造MLLM微调新范式
机器之心· 2025-06-29 02:21
Core Viewpoint - The article discusses the limitations of current multimodal large model (MLLM) fine-tuning methods, which often replicate strategies from unimodal language models without considering the unique characteristics of multimodal learning [2][9][23]. Summary by Sections Introduction to MLLMs - Recent advancements in MLLMs have been significant in tasks involving visual-language and audio-language [2]. - Current fine-tuning methods primarily adapt strategies from unimodal language models, such as LoRA, which may not be suitable for multimodal contexts [2][8]. Limitations of Current Fine-Tuning Methods - Many efficient multimodal fine-tuning methods overlook the essential differences between modalities, leading to inadequate utilization of multimodal information [9][11]. - The article emphasizes the need for both unimodal adaptation and cross-modal adaptation in effective multimodal fine-tuning [9][12]. Introduction of MokA Method - The research team proposes a new method called MokA (Multimodal low-rank Adaptation), which balances the independent modeling of unimodal information and the interaction modeling between modalities [3][12][23]. - MokA retains the efficiency of LoRA while redefining the roles of projection matrices in a multimodal context [14][23]. Key Components of MokA - MokA includes three critical modules: 1. **Modality-specific A matrix**: Ensures independent modeling of unimodal information [15]. 2. **Cross-modal attention mechanism**: Enhances interaction between different modalities during instruction tuning [16]. 3. **Shared B matrix**: Facilitates implicit cross-modal alignment by projecting modalities into a shared space [17]. Experimental Results - MokA was evaluated across three representative multimodal task scenarios: audio-visual-text, visual-text, and speech-text [19]. - The method demonstrated significant performance improvements on various benchmark datasets, showcasing its adaptability and effectiveness [19][23]. Conclusion - MokA addresses the oversight of modality differences in current fine-tuning paradigms, providing a new direction for multimodal large model fine-tuning [23].
LoRA中到底有多少参数冗余?新研究:砍掉95%都能保持高性能
机器之心· 2025-05-02 04:39
Core Viewpoint - The article introduces the LoRI technology, which demonstrates that significantly reducing the trainable parameters of LoRA can still maintain strong model performance, achieving comparable or superior results to full fine-tuning and other methods while using only 5% of LoRA's parameters [1][9]. Summary by Sections LoRA and Its Limitations - LoRA is widely adopted for parameter-efficient fine-tuning (PEFT) but still incurs significant memory overhead, especially in large models [3][4]. - Recent research indicates substantial redundancy in incremental parameters, prompting the development of LoRI, which reduces the number of trainable parameters while preserving model knowledge [4]. LoRI Methodology - LoRI keeps the low-rank matrix A fixed as a random projection and uses a task-specific sparse mask to train matrix B, allowing for significant parameter reduction [4][13]. - Even with 90% sparsity in B, LoRI maintains good performance, indicating that the adaptation process does not require updating A [4][17]. Multi-Task Learning and Adapter Merging - Multi-task learning is essential for creating versatile models, but training on mixed datasets is costly. LoRI allows for the merging of existing models without retraining, effectively combining LoRA adapters for multi-task capabilities [7]. - Directly merging heterogeneous LoRA can lead to parameter interference, but LoRI mitigates this by mapping task-specific adapters to nearly orthogonal subspaces [7][20]. Continuous Learning and Safety - LoRI provides a lightweight continuous learning method that maintains safety while adapting to new tasks, addressing the challenge of catastrophic forgetting [8][22]. - The two-phase training process for safety adapters shows that LoRI-S outperforms other methods in retaining safety alignment, even under aggressive sparsity [22][23]. Performance Evaluation - Extensive experiments on various benchmarks show that LoRI achieves or exceeds the performance of full fine-tuning and other PEFT methods while using 95% fewer trainable parameters [9][19]. - In single-task performance, LoRI variants demonstrate competitive results across natural language understanding, mathematics, programming, and safety tasks [19][20]. Conclusion - Overall, LoRI presents an effective and lightweight approach to building safe adapters that support downstream task adaptation while maintaining alignment [23].