RobustMerge
Search documents
NeurIPS2025 Spotlight | RobustMerge: 多模态大模型高效微调模型合并的全新范式
机器之心· 2025-11-10 04:40
Core Insights - The article discusses the challenge of efficiently merging multiple specialized models into a general model in the context of rapidly advancing AI technology, highlighting the concept of "direction robustness" as a key factor in the failure of parameter-efficient fine-tuning (PEFT) module merging [2][7][10]. - A new solution called RobustMerge is proposed, which offers a simple and efficient method for model merging without additional costs, providing significant potential for developers and researchers working on multi-modal large models [2][8]. Problem Definition - The rise of multi-modal large models has increased computational demands, making full fine-tuning (FFT) costly and impractical for many users. As a result, parameter-efficient fine-tuning (PEFT), particularly LoRA, has become mainstream, allowing for quick adaptation to downstream tasks by updating only a small portion of model parameters [7][8]. - Traditional methods for merging models, such as multi-task learning, face challenges related to training costs and data availability, leading to the exploration of model merging as a more efficient alternative [8][10]. Key Contributions - RobustMerge addresses the shortcomings of existing PEFT merging methods by identifying the core issue of direction instability rather than parameter sign conflicts, thus paving the way for a new paradigm in LoRA merging [10][41]. - The method employs a two-phase merging strategy: pruning and complementary scaling, followed by cross-task normalization, to enhance the stability of low-rank directions during the merging process [16][19][23]. Experimental Design and Results - RobustMerge was tested across multiple benchmarks, including a newly created benchmark called MM-MergeBench, which evaluates performance on both seen and unseen tasks, demonstrating significant improvements in multi-task performance and generalization capabilities [28][31]. - The results indicate that RobustMerge outperforms traditional methods, achieving an average accuracy increase of 3.4% on seen tasks and a 4.5% improvement on unseen tasks, showcasing its effectiveness in reducing task interference and enhancing multi-task performance [31][32]. Practical Applications - The RobustMerge approach can be applied in various scenarios, including rapid deployment of multi-task models, federated learning, and model editing or style transfer, making it a valuable tool for enterprises looking to build complex AI applications efficiently [44][45].