LoRA
Search documents
比LoRA更快更强,全新框架LoFA上线,秒级适配大模型
具身智能之心· 2025-12-19 00:05
Core Insights - The article discusses the limitations of existing visual generative models in meeting personalized user demands, particularly in generating precise outputs based on fine-grained instructions [5][6] - It introduces a new framework called LoFA, which allows for rapid adaptation of large models to personalized tasks without lengthy optimization processes, achieving results comparable to traditional methods [24] Group 1: Background and Challenges - The demand for creative media and visual content has led to the development of powerful visual generative models trained on large datasets, but these models struggle with specific user instructions [5][6] - Traditional methods like parameter-efficient fine-tuning (PEFT) require extensive optimization for each personalized task, making them impractical for real-time applications [6][10] Group 2: LoFA Framework - LoFA is designed to predict personalized LoRA parameters directly from diverse user instructions, enabling fast adaptation of visual generative models [8][10] - The framework incorporates a novel guiding mechanism within a hypernetwork to predict complete, uncompressed LoRA weights, avoiding information loss [11][12] Group 3: Methodology - The learning process in LoFA is divided into two phases: first predicting a simplified response map and then using this knowledge to guide the final LoRA weight prediction [10][11] - This structured approach allows the network to focus on key adaptation areas, enhancing stability and efficiency [11] Group 4: Experimental Analysis - The effectiveness of the LoFA framework was evaluated through systematic experiments in video and image generation tasks, demonstrating superior performance compared to baseline methods [13][14] - In video generation, LoFA was tested on personalized human action video generation and style transfer tasks, while in image generation, it focused on ID personalization [13][14] Group 5: Conclusion and Future Outlook - LoFA overcomes key limitations of existing personalization techniques by eliminating lengthy optimization processes and achieving comparable or superior performance to individually optimized models [24] - Future developments aim to create a unified hypernetwork capable of zero-shot learning for various specific instructions, expanding the framework's applicability [24]
比LoRA更快更强,全新框架LoFA上线,秒级适配大模型
机器之心· 2025-12-18 00:03
Core Insights - The article discusses the limitations of traditional visual generative models in meeting personalized user demands, particularly in generating precise outputs based on fine-grained instructions [6][7] - It introduces a new framework called LoFA, which allows for rapid adaptation of large models to personalized tasks without lengthy optimization processes, achieving results comparable to or better than traditional methods like LoRA [2][24] Group 1: Problem Statement - There is a growing demand for creative media and visual content, leading to the development of powerful visual generative models trained on large datasets [6] - Existing methods for personalizing these models, such as parameter-efficient fine-tuning (PEFT), require extensive optimization time and specific task data, making them impractical for real-time applications [6][7] Group 2: Proposed Solution - LoFA is designed to predict personalized LoRA parameters directly from user instructions, enabling fast adaptation of visual generative models [9][12] - The framework incorporates a novel guiding mechanism within a hypernetwork to predict complete, uncompressed LoRA weights, avoiding information loss associated with compression techniques [9][12] Group 3: Methodology - The learning process in LoFA is divided into two phases: first predicting a simplified response map and then using this knowledge to guide the final LoRA weight prediction [11][12] - This structured approach allows the model to focus on key adaptation areas, enhancing the stability and efficiency of the learning process [12] Group 4: Experimental Results - The effectiveness of the LoFA framework was evaluated through systematic experiments in both video and image generation tasks, demonstrating its ability to handle diverse instruction conditions [14][15] - LoFA outperformed baseline methods and achieved performance comparable to independently optimized LoRA models, significantly reducing adaptation time from hours to seconds [15][24] Group 5: Conclusion and Future Directions - LoFA addresses critical limitations in existing personalization techniques by eliminating lengthy optimization while maintaining high-quality generation results [24] - Future work aims to develop a unified hypernetwork with strong zero-shot capabilities to handle various specific instructions across different domains [24]
1100多个模型殊途同归,指向一个「通用子空间」,柏拉图又赢一回?
机器之心· 2025-12-14 04:53
Core Insights - The importance of model architecture may exceed previous understanding, as a study from Johns Hopkins University reveals that over 1,100 different neural networks converge to a shared low-dimensional subspace, suggesting a "prior" mathematical structure that all neural networks approach [1][2][14]. Group 1: Findings and Implications - This discovery helps explain several phenomena, such as why over-parameterized models can generalize, why different initializations lead to similar representations, and the effectiveness of techniques like LoRA and weight sharing [2][14]. - The research provides empirical evidence for the existence of a universal weight subspace hypothesis, indicating that all models may converge to a common subspace, which could limit diversity and introduce inherent biases [8][14][33]. - The study suggests that shared subspaces could enable large-scale model compression, rapid adaptation to new tasks, and insights into generalization boundaries and optimization landscapes [14][15]. Group 2: Methodology and Results - The authors focused on LoRA adapters and observed the emergence of a universal subspace in the Mistral-7B model, extending the analysis to 500 Vision Transformers and 50 LLaMA3-8B models, all trained on different datasets and initializations [11][15]. - The analysis revealed that a unique shared low-rank structure exists across various tasks, with most information concentrated in 16 or fewer subspace directions, supporting the practical utility of the universal subspace [19][22]. - The universal subspace model demonstrated a 19-fold improvement in memory efficiency, as it eliminated the need to store all individual LoRA models [23]. Group 3: Theoretical Considerations - The authors propose several theoretical factors contributing to the emergence of universal subspaces, including neural networks' preference for low-frequency functions, strong inductive biases imposed by modern architectures, and the universal nature of gradient-based optimization methods [36][37].
今天,好像见证了属于SD时代的消亡。
数字生命卡兹克· 2025-10-13 01:33
Core Viewpoint - The article reflects on the evolution of the AI drawing community, particularly focusing on the transition from the early days of Stable Diffusion (SD) to the current state marked by the launch of liblib 2.0, indicating a significant shift in the landscape of AI tools and user engagement [2][55]. Group 1: Historical Context - The article reminisces about the peak of the SD open-source community, highlighting its rapid growth and the excitement it generated among users [11][31]. - It mentions the initial struggles and learning curves faced by users in understanding complex parameters and prompts necessary for generating images [50][51]. - The community was characterized by a sense of exploration and innovation, with users actively engaging in discussions and sharing techniques [47][41]. Group 2: Transition to Liblib 2.0 - Liblib has announced an upgrade to version 2.0, introducing a new brand, logo, interface, and features aimed at simplifying user experience and expanding its user base [3][67]. - The upgrade signifies a shift towards a more integrated platform that combines various AI drawing and video models, aiming to lower the entry barrier for new users [60][65]. - The article suggests that this transition is a natural progression in the industry, akin to technological advancements that replace older methods [56][57]. Group 3: Community and User Engagement - The article notes a decline in user engagement and interest in the original SD models, as newer, simpler tools have emerged that cater to a broader audience [9][54]. - Despite the changes, the community remains vibrant, with a focus on creativity and the enduring presence of talented creators [75][76]. - The narrative emphasizes that while tools may evolve or disappear, the essence of creativity and the community's spirit will persist [75][76].
ChatGPT架构师,刚发布了最新研究成果
量子位· 2025-09-30 12:22
Core Insights - The article discusses the latest research from Thingking Machines on an efficient fine-tuning method called LoRA, co-authored by John Schulman, a co-founder of OpenAI [1][3][27]. Group 1: Research Findings - The research titled "LoRA Without Regret" explores the conditions under which LoRA can match the efficiency of full fine-tuning (FullFT) and provides a simplified approach to reduce the difficulty of hyperparameter tuning [3][7]. - Current large models often have trillions of parameters and are trained on vast datasets, but downstream tasks typically require only small datasets focused on specific domains [6]. - LoRA, as a parameter-efficient fine-tuning method, captures fine-tuning information through low-rank matrices, and the research confirms that LoRA can achieve similar performance to FullFT by focusing on key details [7][12]. Group 2: Performance Comparisons - The optimal learning rate for LoRA is found to be ten times that of FullFT, demonstrating its capability to compete effectively in fine-tuning scenarios with medium to small datasets [9][12]. - Experiments using Llama 3 and Qwen3 models on specific datasets showed that high-rank LoRA's learning curves closely align with FullFT, with both exhibiting logarithmic decreases in loss values during training [10][11]. - In mathematical reasoning tasks, even with a rank of 1, LoRA's performance remains comparable to FullFT, highlighting its efficiency in information absorption during training [13][14]. Group 3: Application Insights - The research emphasizes that applying LoRA across all layers of a model, rather than just focusing on attention layers, is crucial for maximizing its performance [15][19]. - Previous studies often limited LoRA's application to attention matrices, but this research indicates that broader application leads to significant performance improvements [16][19]. - The findings suggest that the dominant gradient control lies with layers that have more parameters, necessitating full-layer coverage for LoRA to approach FullFT performance [21]. Group 4: Hyperparameter Tuning - The research team proposes a simplified approach to reduce the complexity of tuning LoRA's hyperparameters, identifying that the optimal learning rate consistently follows a specific pattern [22][25]. - Out of four potential hyperparameters, two are deemed redundant, allowing users to focus on "initial update scale" and "steps of deviation from initial state" to streamline the tuning process [25][26]. - This simplification effectively reduces the tuning difficulty of LoRA by half, making it more accessible for users [26].
Thinking Machines又发高质量博客:力推LoRA,不输全量微调
机器之心· 2025-09-30 10:38
Core Insights - The article emphasizes the advantages of LoRA (Low-Rank Adaptation) over Full Fine-tuning (FullFT) in terms of cost-effectiveness and performance in various training scenarios [2][7][18]. Group 1: Importance of LoRA - LoRA is a popular parameter-efficient fine-tuning method that updates a low-dimensional adapter instead of the entire model weights, leading to lower memory requirements and faster loading [11][13]. - The research indicates that LoRA can achieve performance comparable to FullFT in small to medium-sized datasets, while it may struggle in large datasets due to capacity limitations [14][22]. Group 2: Key Findings - The study found that LoRA's performance is closely tied to the training conditions, including the size of the training dataset and the rank of the LoRA parameters [16][25]. - In reinforcement learning tasks, even with a very low rank (rank=1), LoRA can perform similarly to FullFT, indicating that reinforcement learning has lower capacity demands [29]. Group 3: Experimental Methodology - The research utilized models like LLaMA 3 and Qwen3, adjusting LoRA ranks from 1 to 512 and scanning learning rates to find optimal training conditions [20][21]. - Results showed that high-rank LoRA performed almost identically to FullFT in certain datasets, but performance varied across different tasks due to training dynamics [22][24]. Group 4: Practical Implications - LoRA's optimal learning rate is typically about 10 times that of FullFT, allowing it to accept higher learning rates under the same conditions [35]. - The study suggests that applying LoRA across all layers, especially MLP and MoE layers, is crucial for achieving performance close to FullFT [37].
ICML 2025 | CoTo:让LoRA训练「渐入佳境」,模型融合、剪枝样样精通
机器之心· 2025-07-26 12:17
Core Viewpoint - The article introduces CoTo, a progressive training strategy designed to enhance the robustness and effectiveness of Low-Rank Adaptation (LoRA) models, addressing issues such as training instability and performance drop after pruning [1][4][23]. Summary by Sections Conventional LoRA Training Issues - LoRA faces challenges including "lazy training," where optimization gets stuck near suboptimal solutions, limiting generalization [7] - There is a hierarchical imbalance in training, with gradient updates concentrated on top layers, leading to undertraining of lower layers [7] - These issues complicate downstream operations like model fusion and pruning, often resulting in unsatisfactory outcomes [7] CoTo Strategy - CoTo employs a simple yet effective progressive activation strategy, initially deactivating a portion of LoRA adapters to encourage uniform gradient flow across all layers [5][8] - The activation probability of adapters is gradually increased during training, returning to standard fine-tuning mode in later stages [8] Experimental Results - CoTo significantly improves the fusion and pruning capabilities of LoRA models, enhancing single-task generalization performance and training efficiency [12][23] - In linear interpolation tasks, CoTo models maintain smooth performance transitions, unlike standard LoRA, which experiences sharp declines [13] - CoTo outperforms standard LoRA in both structured and unstructured pruning scenarios, demonstrating enhanced fault tolerance [17] Performance and Efficiency Improvements - CoTo consistently boosts performance across various benchmarks, including visual and language tasks, and achieves over 24% training acceleration when applied to HiRA [24][23] Ablation Studies - Rigorous ablation studies validate the design choices of CoTo and provide insights into effective regularization of LoRA [21] Conclusion - CoTo effectively resolves hierarchical imbalance and lazy optimization issues in LoRA training, enhancing model robustness and simplifying downstream operations like fusion and pruning [23]
充分激发模态协作,MokA量身打造MLLM微调新范式
机器之心· 2025-06-29 02:21
Core Viewpoint - The article discusses the limitations of current multimodal large model (MLLM) fine-tuning methods, which often replicate strategies from unimodal language models without considering the unique characteristics of multimodal learning [2][9][23]. Summary by Sections Introduction to MLLMs - Recent advancements in MLLMs have been significant in tasks involving visual-language and audio-language [2]. - Current fine-tuning methods primarily adapt strategies from unimodal language models, such as LoRA, which may not be suitable for multimodal contexts [2][8]. Limitations of Current Fine-Tuning Methods - Many efficient multimodal fine-tuning methods overlook the essential differences between modalities, leading to inadequate utilization of multimodal information [9][11]. - The article emphasizes the need for both unimodal adaptation and cross-modal adaptation in effective multimodal fine-tuning [9][12]. Introduction of MokA Method - The research team proposes a new method called MokA (Multimodal low-rank Adaptation), which balances the independent modeling of unimodal information and the interaction modeling between modalities [3][12][23]. - MokA retains the efficiency of LoRA while redefining the roles of projection matrices in a multimodal context [14][23]. Key Components of MokA - MokA includes three critical modules: 1. **Modality-specific A matrix**: Ensures independent modeling of unimodal information [15]. 2. **Cross-modal attention mechanism**: Enhances interaction between different modalities during instruction tuning [16]. 3. **Shared B matrix**: Facilitates implicit cross-modal alignment by projecting modalities into a shared space [17]. Experimental Results - MokA was evaluated across three representative multimodal task scenarios: audio-visual-text, visual-text, and speech-text [19]. - The method demonstrated significant performance improvements on various benchmark datasets, showcasing its adaptability and effectiveness [19][23]. Conclusion - MokA addresses the oversight of modality differences in current fine-tuning paradigms, providing a new direction for multimodal large model fine-tuning [23].
LoRA中到底有多少参数冗余?新研究:砍掉95%都能保持高性能
机器之心· 2025-05-02 04:39
Core Viewpoint - The article introduces the LoRI technology, which demonstrates that significantly reducing the trainable parameters of LoRA can still maintain strong model performance, achieving comparable or superior results to full fine-tuning and other methods while using only 5% of LoRA's parameters [1][9]. Summary by Sections LoRA and Its Limitations - LoRA is widely adopted for parameter-efficient fine-tuning (PEFT) but still incurs significant memory overhead, especially in large models [3][4]. - Recent research indicates substantial redundancy in incremental parameters, prompting the development of LoRI, which reduces the number of trainable parameters while preserving model knowledge [4]. LoRI Methodology - LoRI keeps the low-rank matrix A fixed as a random projection and uses a task-specific sparse mask to train matrix B, allowing for significant parameter reduction [4][13]. - Even with 90% sparsity in B, LoRI maintains good performance, indicating that the adaptation process does not require updating A [4][17]. Multi-Task Learning and Adapter Merging - Multi-task learning is essential for creating versatile models, but training on mixed datasets is costly. LoRI allows for the merging of existing models without retraining, effectively combining LoRA adapters for multi-task capabilities [7]. - Directly merging heterogeneous LoRA can lead to parameter interference, but LoRI mitigates this by mapping task-specific adapters to nearly orthogonal subspaces [7][20]. Continuous Learning and Safety - LoRI provides a lightweight continuous learning method that maintains safety while adapting to new tasks, addressing the challenge of catastrophic forgetting [8][22]. - The two-phase training process for safety adapters shows that LoRI-S outperforms other methods in retaining safety alignment, even under aggressive sparsity [22][23]. Performance Evaluation - Extensive experiments on various benchmarks show that LoRI achieves or exceeds the performance of full fine-tuning and other PEFT methods while using 95% fewer trainable parameters [9][19]. - In single-task performance, LoRI variants demonstrate competitive results across natural language understanding, mathematics, programming, and safety tasks [19][20]. Conclusion - Overall, LoRI presents an effective and lightweight approach to building safe adapters that support downstream task adaptation while maintaining alignment [23].