Model Merging
Search documents
FDA对偶锚点:模型知识迁移的新视角——从参数空间到输入空间
机器之心· 2025-11-14 01:33
Core Insights - The article discusses the introduction of FDA (Model Merging with Functional Dual Anchors), a novel framework for merging expert models derived from task-specific fine-tuning of general foundational models, aiming to integrate their capabilities into a single model without accessing original training data [2][4]. Group 1: FDA Framework Overview - FDA represents task knowledge embedded in model parameters using a set of dual synthetic input points, allowing for efficient knowledge integration through induced gradients in the input space [4][10]. - Unlike traditional methods that rely on arithmetic operations in parameter space, FDA shifts the knowledge integration process to the input space, providing a new perspective on model merging [4][9]. - The framework is designed to be scalable to large neural network models, demonstrating superior performance and scalability in visual and natural language models [4][12]. Group 2: Performance and Robustness - Experimental results indicate that FDA significantly outperforms traditional task vector methods, achieving an average performance of 87.26 in multi-task scenarios compared to 73.94 for task vectors, marking an improvement of nearly 18% [14]. - FDA exhibits flexible knowledge modeling capabilities, achieving an average performance increase of approximately 5.10% on ViT-B/16 and about 13% on RoBERTa-Large, showcasing its adaptability across different architectures [15]. Group 3: Algorithm Implementation - The FDA algorithm consists of two main phases: construction of FDA samples for each downstream task and parameter updates based on the constructed FDA [17][19]. - Two practical initialization strategies for FDA construction are proposed: linear weight sampling and scaled Gaussian sampling, which help in optimizing the initial points effectively [18]. Group 4: Knowledge Encoding and Mechanisms - FDA captures task-related dominant representation directions while suppressing redundant or noisy components, aligning with the low-rank structure typically observed in task-specific knowledge in parameter space [22]. - The optimization process of FDA aligns its high-energy subspace with the high-energy subspace of real data, indicating a potential connection between the knowledge encoded in FDA and the actual task data [23]. - The parameter updates induced by FDA gradually align with those induced by real data, demonstrating its robustness and effectiveness in capturing task-related knowledge [24].