参数高效微调（PEFT） - filings, earnings calls, financial reports, news

参数高效微调（PEFT）

Search documents

NUS LV Lab新作｜FeRA：基于「频域能量」动态路由，打破扩散模型微调的静态瓶颈

机器之心· 2025-12-12 03:41

然而，现有的微调方法（如 LoRA、AdaLoRA）大多采用「静态」策略：无论模型处于去噪过程的哪个阶段，适配器（Adapter）的参数都是固定不变的。这种「一刀切」的方式忽略了扩散生成过程内在的时序物理规律，导致模型在处理复杂结构与精细纹理时往往顾此失彼。针对上述问题，新加坡国立大学 LV Lab（颜水成团队）联合电子科技大学、浙江大学等机构提出 FeRA (Frequency-Energy Constrained Routing) 框架：首次从频域能量的第一性原理出发，揭示了扩散去噪过程具有显著的「低频到高频」演变规律，并据此设计了动态路由机制。 FeRA 摒弃了传统的静态微调思路，通过实时感知潜空间（Latent Space）的频域能量分布，动态调度不同的专家模块。实验结果显示， FeRA 在 SD 1.5、SDXL、 Flux.1 等多个主流底座上，于风格迁移和主体定制任务中均实现了远超 baseline 的生成质量。尹博：NUS 计算机工程硕士生、LV Lab 实习生，研究方向是生成式 AI，及参数高效率微调（PEFT）。胡晓彬：NUS LV Lab Senior Research ...

3 6 Ke· 2025-11-17 09:52

Core Insights - The article discusses a perplexing phenomenon in large model reinforcement learning (RL) training, where significant performance improvements occur despite minimal parameter changes [1][3]. Group 1: Research Findings - The paper analyzes the dynamics of Verifiable Reward Reinforcement Learning (RLVR) training, debunking the misconception that sparse parameter updates are merely superficial; instead, it reveals a fixed optimization bias inherent in RLVR [3][5]. - The research introduces a new framework called the Three-Gate Theory, which explains how RLVR parameter updates are directed towards specific parameter regions [5][7]. Group 2: Parameter Update Characteristics - The study highlights a paradox where RL training yields high performance gains with sparse parameter updates, contrasting with the dense updates seen in supervised fine-tuning (SFT) [5][6]. - The sparsity of updates in RL training ranges from 36% to 92%, while SFT shows sparsity between 0.6% and 18.8%, indicating a significant difference in update density [5][6]. Group 3: Three-Gate Theory Components - The first gate, KL Anchoring, ensures that RL updates do not deviate significantly from the model's original output style, maintaining a small drift in parameter movement [8]. - The second gate, Model Geometry, indicates that RL updates prefer low-curvature directions in the optimization landscape, preserving the model's original weight structure [9]. - The third gate, Precision, explains that the limited precision of bfloat16 can mask small updates in RL, leading to the appearance of sparsity [11]. Group 4: Implications for Parameter Efficient Fine-Tuning - The findings suggest that many parameter-efficient fine-tuning (PEFT) methods from the SFT era do not transfer well to RLVR, particularly those aligned with sparse or low-rank priors [17]. - The study indicates that updating non-principal, low-amplitude weights aligns better with RLVR's optimization trajectory, while methods like PiSSA may not provide additional benefits and can lead to instability [17].

Meta Platforms(US:META)

可验证奖励强化学习（RLVR）

三门理论（Three-Gate Theory）

模型条件优化偏差

参数高效微调（PEFT）

Qwen系列模型

DeepSeek - R1 - Distill - Qwen

可验证奖励强化学习（RLVR）

三门理论（Three-Gate Theory）

模型条件优化偏差

参数高效微调（PEFT）

Qwen系列模型

DeepSeek - R1 - Distill - Qwen

NeurIPS2025 Spotlight | RobustMerge: 多模态大模型高效微调模型合并的全新范式

机器之心· 2025-11-10 04:40

Core Insights - The article discusses the challenge of efficiently merging multiple specialized models into a general model in the context of rapidly advancing AI technology, highlighting the concept of "direction robustness" as a key factor in the failure of parameter-efficient fine-tuning (PEFT) module merging [2][7][10]. - A new solution called RobustMerge is proposed, which offers a simple and efficient method for model merging without additional costs, providing significant potential for developers and researchers working on multi-modal large models [2][8]. Problem Definition - The rise of multi-modal large models has increased computational demands, making full fine-tuning (FFT) costly and impractical for many users. As a result, parameter-efficient fine-tuning (PEFT), particularly LoRA, has become mainstream, allowing for quick adaptation to downstream tasks by updating only a small portion of model parameters [7][8]. - Traditional methods for merging models, such as multi-task learning, face challenges related to training costs and data availability, leading to the exploration of model merging as a more efficient alternative [8][10]. Key Contributions - RobustMerge addresses the shortcomings of existing PEFT merging methods by identifying the core issue of direction instability rather than parameter sign conflicts, thus paving the way for a new paradigm in LoRA merging [10][41]. - The method employs a two-phase merging strategy: pruning and complementary scaling, followed by cross-task normalization, to enhance the stability of low-rank directions during the merging process [16][19][23]. Experimental Design and Results - RobustMerge was tested across multiple benchmarks, including a newly created benchmark called MM-MergeBench, which evaluates performance on both seen and unseen tasks, demonstrating significant improvements in multi-task performance and generalization capabilities [28][31]. - The results indicate that RobustMerge outperforms traditional methods, achieving an average accuracy increase of 3.4% on seen tasks and a 4.5% improvement on unseen tasks, showcasing its effectiveness in reducing task interference and enhancing multi-task performance [31][32]. Practical Applications - The RobustMerge approach can be applied in various scenarios, including rapid deployment of multi-task models, federated learning, and model editing or style transfer, making it a valuable tool for enterprises looking to build complex AI applications efficiently [44][45].

ChatGPT架构师，刚发布了最新研究成果

量子位· 2025-09-30 12:22

Core Insights - The article discusses the latest research from Thingking Machines on an efficient fine-tuning method called LoRA, co-authored by John Schulman, a co-founder of OpenAI [1][3][27]. Group 1: Research Findings - The research titled "LoRA Without Regret" explores the conditions under which LoRA can match the efficiency of full fine-tuning (FullFT) and provides a simplified approach to reduce the difficulty of hyperparameter tuning [3][7]. - Current large models often have trillions of parameters and are trained on vast datasets, but downstream tasks typically require only small datasets focused on specific domains [6]. - LoRA, as a parameter-efficient fine-tuning method, captures fine-tuning information through low-rank matrices, and the research confirms that LoRA can achieve similar performance to FullFT by focusing on key details [7][12]. Group 2: Performance Comparisons - The optimal learning rate for LoRA is found to be ten times that of FullFT, demonstrating its capability to compete effectively in fine-tuning scenarios with medium to small datasets [9][12]. - Experiments using Llama 3 and Qwen3 models on specific datasets showed that high-rank LoRA's learning curves closely align with FullFT, with both exhibiting logarithmic decreases in loss values during training [10][11]. - In mathematical reasoning tasks, even with a rank of 1, LoRA's performance remains comparable to FullFT, highlighting its efficiency in information absorption during training [13][14]. Group 3: Application Insights - The research emphasizes that applying LoRA across all layers of a model, rather than just focusing on attention layers, is crucial for maximizing its performance [15][19]. - Previous studies often limited LoRA's application to attention matrices, but this research indicates that broader application leads to significant performance improvements [16][19]. - The findings suggest that the dominant gradient control lies with layers that have more parameters, necessitating full-layer coverage for LoRA to approach FullFT performance [21]. Group 4: Hyperparameter Tuning - The research team proposes a simplified approach to reduce the complexity of tuning LoRA's hyperparameters, identifying that the optimal learning rate consistently follows a specific pattern [22][25]. - Out of four potential hyperparameters, two are deemed redundant, allowing users to focus on "initial update scale" and "steps of deviation from initial state" to streamline the tuning process [25][26]. - This simplification effectively reduces the tuning difficulty of LoRA by half, making it more accessible for users [26].

参数高效微调（PEFT）

强化学习

Artificial Intelligence

Artificial Intelligence

LoRA

FullFT

ChatGPT

Thinking Machines又发高质量博客：力推LoRA，不输全量微调

机器之心· 2025-09-30 10:38

Core Insights - The article emphasizes the advantages of LoRA (Low-Rank Adaptation) over Full Fine-tuning (FullFT) in terms of cost-effectiveness and performance in various training scenarios [2][7][18]. Group 1: Importance of LoRA - LoRA is a popular parameter-efficient fine-tuning method that updates a low-dimensional adapter instead of the entire model weights, leading to lower memory requirements and faster loading [11][13]. - The research indicates that LoRA can achieve performance comparable to FullFT in small to medium-sized datasets, while it may struggle in large datasets due to capacity limitations [14][22]. Group 2: Key Findings - The study found that LoRA's performance is closely tied to the training conditions, including the size of the training dataset and the rank of the LoRA parameters [16][25]. - In reinforcement learning tasks, even with a very low rank (rank=1), LoRA can perform similarly to FullFT, indicating that reinforcement learning has lower capacity demands [29]. Group 3: Experimental Methodology - The research utilized models like LLaMA 3 and Qwen3, adjusting LoRA ranks from 1 to 512 and scanning learning rates to find optimal training conditions [20][21]. - Results showed that high-rank LoRA performed almost identically to FullFT in certain datasets, but performance varied across different tasks due to training dynamics [22][24]. Group 4: Practical Implications - LoRA's optimal learning rate is typically about 10 times that of FullFT, allowing it to accept higher learning rates under the same conditions [35]. - The study suggests that applying LoRA across all layers, especially MLP and MoE layers, is crucial for achieving performance close to FullFT [37].

参数高效微调（PEFT）

低遗憾区间（low - regret region）

Artificial Intelligence

LoRA

Full Fine - tuning (FullFT)

参数高效微调（PEFT）

低遗憾区间（low - regret region）

Artificial Intelligence

LoRA

Full Fine - tuning (FullFT)

LoRA中到底有多少参数冗余？新研究：砍掉95%都能保持高性能

机器之心· 2025-05-02 04:39

Core Viewpoint - The article introduces the LoRI technology, which demonstrates that significantly reducing the trainable parameters of LoRA can still maintain strong model performance, achieving comparable or superior results to full fine-tuning and other methods while using only 5% of LoRA's parameters [1][9]. Summary by Sections LoRA and Its Limitations - LoRA is widely adopted for parameter-efficient fine-tuning (PEFT) but still incurs significant memory overhead, especially in large models [3][4]. - Recent research indicates substantial redundancy in incremental parameters, prompting the development of LoRI, which reduces the number of trainable parameters while preserving model knowledge [4]. LoRI Methodology - LoRI keeps the low-rank matrix A fixed as a random projection and uses a task-specific sparse mask to train matrix B, allowing for significant parameter reduction [4][13]. - Even with 90% sparsity in B, LoRI maintains good performance, indicating that the adaptation process does not require updating A [4][17]. Multi-Task Learning and Adapter Merging - Multi-task learning is essential for creating versatile models, but training on mixed datasets is costly. LoRI allows for the merging of existing models without retraining, effectively combining LoRA adapters for multi-task capabilities [7]. - Directly merging heterogeneous LoRA can lead to parameter interference, but LoRI mitigates this by mapping task-specific adapters to nearly orthogonal subspaces [7][20]. Continuous Learning and Safety - LoRI provides a lightweight continuous learning method that maintains safety while adapting to new tasks, addressing the challenge of catastrophic forgetting [8][22]. - The two-phase training process for safety adapters shows that LoRI-S outperforms other methods in retaining safety alignment, even under aggressive sparsity [22][23]. Performance Evaluation - Extensive experiments on various benchmarks show that LoRI achieves or exceeds the performance of full fine-tuning and other PEFT methods while using 95% fewer trainable parameters [9][19]. - In single-task performance, LoRI variants demonstrate competitive results across natural language understanding, mathematics, programming, and safety tasks [19][20]. Conclusion - Overall, LoRI presents an effective and lightweight approach to building safe adapters that support downstream task adaptation while maintaining alignment [23].