Workflow
参数高效微调(PEFT)
icon
Search documents
ChatGPT架构师,刚发布了最新研究成果
量子位· 2025-09-30 12:22
闻乐 发自 凹非寺 量子位 | 公众号 QbitAI 距第二篇研究仅过去三天,Thingking Machines发布了第三篇研究博客。 核心作者是OpenAI联创之一 John Schulman 。 Thingking Machines创始人、OpenAI前CTO Mira Murati继续转发站台。 第三篇研究是关于 LoRA参数的高效微调方法 ,题目为《LoRA Without Regret》,探究了LoRA匹配全量微调(FullFT)效率的条件,还 给出了大幅降低调参难度的简化方案。 当前主流大模型动辄万亿参数,预训练数据达数十万亿token,但下游任务往往只需要小数据集、聚焦特定领域。 用FullFT更新所有参数,资源浪费严重。 而LoRA作为参数高效微调(PEFT)的核心方法,通过低秩矩阵A和B(总参数远少于原权重)捕捉微调信息,却始终面临一个争议: 它真的 能追上FullFT的性能吗? John Schulman和Thingking Machines团队给出了肯定答案:只要抓准关键细节,LoRA不仅能和FullFT拥有相同的样本效率,还能达到一 样的最终性能。 下面具体来看。 LoRA最优学习率 ...
Thinking Machines又发高质量博客:力推LoRA,不输全量微调
机器之心· 2025-09-30 10:38
Core Insights - The article emphasizes the advantages of LoRA (Low-Rank Adaptation) over Full Fine-tuning (FullFT) in terms of cost-effectiveness and performance in various training scenarios [2][7][18]. Group 1: Importance of LoRA - LoRA is a popular parameter-efficient fine-tuning method that updates a low-dimensional adapter instead of the entire model weights, leading to lower memory requirements and faster loading [11][13]. - The research indicates that LoRA can achieve performance comparable to FullFT in small to medium-sized datasets, while it may struggle in large datasets due to capacity limitations [14][22]. Group 2: Key Findings - The study found that LoRA's performance is closely tied to the training conditions, including the size of the training dataset and the rank of the LoRA parameters [16][25]. - In reinforcement learning tasks, even with a very low rank (rank=1), LoRA can perform similarly to FullFT, indicating that reinforcement learning has lower capacity demands [29]. Group 3: Experimental Methodology - The research utilized models like LLaMA 3 and Qwen3, adjusting LoRA ranks from 1 to 512 and scanning learning rates to find optimal training conditions [20][21]. - Results showed that high-rank LoRA performed almost identically to FullFT in certain datasets, but performance varied across different tasks due to training dynamics [22][24]. Group 4: Practical Implications - LoRA's optimal learning rate is typically about 10 times that of FullFT, allowing it to accept higher learning rates under the same conditions [35]. - The study suggests that applying LoRA across all layers, especially MLP and MoE layers, is crucial for achieving performance close to FullFT [37].
LoRA中到底有多少参数冗余?新研究:砍掉95%都能保持高性能
机器之心· 2025-05-02 04:39
Core Viewpoint - The article introduces the LoRI technology, which demonstrates that significantly reducing the trainable parameters of LoRA can still maintain strong model performance, achieving comparable or superior results to full fine-tuning and other methods while using only 5% of LoRA's parameters [1][9]. Summary by Sections LoRA and Its Limitations - LoRA is widely adopted for parameter-efficient fine-tuning (PEFT) but still incurs significant memory overhead, especially in large models [3][4]. - Recent research indicates substantial redundancy in incremental parameters, prompting the development of LoRI, which reduces the number of trainable parameters while preserving model knowledge [4]. LoRI Methodology - LoRI keeps the low-rank matrix A fixed as a random projection and uses a task-specific sparse mask to train matrix B, allowing for significant parameter reduction [4][13]. - Even with 90% sparsity in B, LoRI maintains good performance, indicating that the adaptation process does not require updating A [4][17]. Multi-Task Learning and Adapter Merging - Multi-task learning is essential for creating versatile models, but training on mixed datasets is costly. LoRI allows for the merging of existing models without retraining, effectively combining LoRA adapters for multi-task capabilities [7]. - Directly merging heterogeneous LoRA can lead to parameter interference, but LoRI mitigates this by mapping task-specific adapters to nearly orthogonal subspaces [7][20]. Continuous Learning and Safety - LoRI provides a lightweight continuous learning method that maintains safety while adapting to new tasks, addressing the challenge of catastrophic forgetting [8][22]. - The two-phase training process for safety adapters shows that LoRI-S outperforms other methods in retaining safety alignment, even under aggressive sparsity [22][23]. Performance Evaluation - Extensive experiments on various benchmarks show that LoRI achieves or exceeds the performance of full fine-tuning and other PEFT methods while using 95% fewer trainable parameters [9][19]. - In single-task performance, LoRI variants demonstrate competitive results across natural language understanding, mathematics, programming, and safety tasks [19][20]. Conclusion - Overall, LoRI presents an effective and lightweight approach to building safe adapters that support downstream task adaptation while maintaining alignment [23].