Workflow
课程学习
icon
Search documents
大模型微调范式认知再被颠覆?UIUC、Amazon团队最新研究指出SFT灾难性遗忘问题或被误解
机器之心· 2025-10-21 03:43
Core Insights - The article discusses the impact of supervised fine-tuning (SFT) on the general capabilities of large language models (LLMs), suggesting that SFT does not always lead to a significant decline in general performance when a smaller learning rate is used [2][34] - The research challenges the long-held belief that domain-specific fine-tuning inevitably causes catastrophic forgetting of general capabilities, proposing that the choice of training strategy plays a crucial role [2][34] Experiment Details - The study utilized two domain-specific datasets, MedCalc and ESCI, which represent scenarios where open-source LLMs typically perform poorly, making them ideal for domain-specific SFT [5] - Various open-source LLMs were selected for experimentation, including Qwen3-8B and Gemma3-4B, with a focus on controlling the learning rate during SFT [6] Findings - **Finding 1**: Using a smaller learning rate (e.g., 1e-6) allows models to maintain strong performance in target domains while significantly reducing the decline in general capabilities [11] - **Finding 2**: For classification tasks, when the training objective includes only the final label, a wider range of learning rates can achieve ideal performance, as seen in the ESCI dataset [12][14] Theoretical Analysis - The research team concluded that smaller learning rates can effectively limit the decline in general performance, aligning with the experimental findings [17] - The analysis also indicated that when training targets only include final labels, the number of "hard tokens" encountered decreases, allowing for a broader acceptable learning rate range [17] Token Adaptive Loss Reweighting (TALR) - TALR is introduced as a method to dynamically adjust the loss weights of tokens based on their prediction probabilities, effectively reducing the impact of hard tokens during training [20] - The method allows for real-time updates of token weights, ensuring that the model's confidence levels guide the training process [21] Experimental Results - In experiments comparing various strategies to mitigate catastrophic forgetting, TALR demonstrated superior performance, especially under higher learning rates, maintaining domain gains while minimizing losses in general performance [26][27] Conclusion and Future Directions - The research emphasizes the continued importance of SFT in enhancing LLM capabilities, suggesting that while smaller learning rates and TALR are effective, further exploration of more robust strategies is necessary to address the forgetting problem [34][35] - Future research should focus on balancing domain-specific performance with general capabilities, particularly in specialized fields like medicine, where retaining foundational knowledge is crucial [35]
沉寂一个月,openPangu性能飙升8%!华为1B开源模型来了
机器之心· 2025-09-05 04:31
Core Viewpoint - Huawei's Pangu Embedded-1B model represents a significant advancement in edge AI, enabling powerful AI capabilities on resource-constrained devices, thus paving the way for intelligent upgrades in various industries [1][5]. Group 1: Model Performance and Efficiency - The openPangu Embedded-1B model, with 1 billion parameters, achieves a new state-of-the-art (SOTA) record in performance and efficiency, demonstrating that smaller models can deliver substantial capabilities [2][3]. - The model's overall average score reached 63.90, surpassing similar models and matching larger models like Qwen3-1.7B, showcasing its parameter efficiency [3][4]. - In mathematical reasoning, the model scored 82.76% on the GSM8K benchmark and 81.83% on the MATH dataset, significantly outperforming its peers [3][4]. Group 2: Technical Innovations - The model employs a soft-hardware collaborative design, optimizing its architecture to align with the characteristics of Ascend hardware, ensuring efficient resource utilization [9][10]. - A two-stage curriculum learning approach is utilized to enhance the model's reasoning capabilities, simulating a human-like learning process [15][16]. - The introduction of offline On-Policy knowledge distillation allows for a more flexible and effective training process, improving the model's accuracy and generalization [18][19]. Group 3: Reinforcement Learning and Future Directions - The model incorporates a multi-source reward reinforcement learning mechanism, enhancing its performance through targeted feedback based on task complexity [22][25]. - Future developments aim to integrate fast and slow thinking processes within a single model, allowing for adaptive responses based on problem difficulty, thus improving both speed and accuracy [29][30].
成本暴降88%!通义实验室、北大发布ZeroSearch,无需搜索即可激活LLM检索能力
机器之心· 2025-05-29 04:53
Core Insights - The article introduces the ZeroSearch framework, which enables large language models (LLMs) to activate their search capabilities without relying on real search engines, significantly reducing training costs by 88% while outperforming methods that depend on actual search engines [1][21]. Methodology - ZeroSearch employs a reinforcement learning (RL) framework that utilizes a simulation LLM as a search engine, eliminating the need for real-time API interactions, thus lowering training costs [4][6]. - The framework incorporates a structured training template that guides the model through each interaction, enhancing the clarity and interpretability of the reasoning process [8]. - A loss masking technique is applied to prevent the strategy model from memorizing documents generated by the simulation LLM, ensuring that only tokens generated by the strategy model are considered for loss calculation [4][8]. Training Strategy - The training process begins with a gradual increase in difficulty, allowing the model to learn basic output formats and task logic before rapidly escalating the challenge to enhance reasoning capabilities [22][36]. - A curriculum learning strategy is implemented, progressively lowering the quality of generated documents to stimulate the model's reasoning ability effectively [13][36]. Experimental Results - ZeroSearch demonstrates superior performance across various datasets, achieving an average score of 40.93 in multi-hop question answering tasks, surpassing all baseline methods [20][21]. - The framework shows robust generalization capabilities, with performance improving as model parameters increase, indicating strong scalability [23][27]. - In comparison to real search engines, ZeroSearch exhibits a significant potential to replace them in large-scale RL applications, showcasing its effectiveness in enhancing search capabilities [21][24]. Conclusion - The ZeroSearch framework effectively activates the search capabilities of LLMs without the need for real search engines, demonstrating strong adaptability and scalability across different RL algorithms [36].