Workflow
IDEAL
icon
Search documents
一招缓解LLM偏科!调整训练集组成,“秘方”在此 | 上交大&上海AI Lab等
量子位· 2025-06-10 07:35
Core Viewpoint - The IDEAL method proposed by a joint team from Shanghai Jiao Tong University and Shanghai AI Lab significantly enhances the performance of large language models (LLMs) across various domains by adjusting the composition of the supervised fine-tuning (SFT) training dataset [3][4]. Group 1: Methodology - The IDEAL method focuses on preparing high-quality training datasets for different domains and modeling the optimization problem to minimize validation loss [5]. - The quantity of training data during the SFT phase is not the key factor; rather, the appropriate distribution of data is crucial to avoid exacerbating the "偏科" phenomenon in models [6][15]. - The research quantifies the impact of data adjustment on the optimal model's performance in the validation set, providing a theoretical foundation for the IDEAL approach [7]. Group 2: Computational Efficiency - The paper employs K-FAC theory to approximate the inverse of the Hessian matrix, which simplifies the computation and allows for scalability to LLM parameter sizes [8]. Group 3: Experimental Results - The IDEAL method was tested on the Llama 3.1 8B model, demonstrating a significant improvement in coding capabilities after just two iterations, regardless of the epoch [10]. - The initial distribution of training data can be further optimized, as IDEAL consistently improved average results across various benchmarks, regardless of the initial distribution [11]. Group 4: Practical Applications - IDEAL addresses the challenge of how to effectively combine high-quality training data from various domains into a unified training set, thus eliminating the need for manual adjustments [14]. - The paper suggests that the optimal value for the hyperparameter m should be around 0.15, as it balances the need for data distribution optimization without being too aggressive [15].
一招缓解LLM偏科!调整训练集组成,“秘方”在此 | 上交大&上海AI Lab等
量子位· 2025-06-10 07:35AI Processing
IDEAL团队 投稿 量子位 | 公众号 QbitAI 大幅缓解LLM偏科,只需调整SFT训练集的组成。 本来不擅长coding的Llama 3.1-8B,代码能力明显提升。 上海交大&上海AI Lab联合团队提出创新方法 IDEAL ,可显著提升LLM在多种不同领域上的综合性能。 此外,研究还有一些重要发现,比如: 具体来看—— SFT后LLM部分能力甚至退化 大型语言模型 (LLM) 凭借其强大的理解和逻辑推理能力,在多个领域展现了惊人的能力。除了模型参数量的增大, 高质量的数据是公认的LLM性能提升最关键的影响因素。 当对模型进行监督微调(SFT)时,研究人员发现 LLM在多任务场景下常出现"偏科"现象 ——部分能力突出而部分 能力并未涨进,甚至退化。这种不平衡的现象导致大模型在不同的领域上能力不同,进而影响用户体验。 上海交大和上海AI Lab的研究者迅速将目光聚焦到SFT训练的训练集上,是否可以通过调整训练集的组成来缓解LLM 偏科的情况?直觉上来看,直接将LLM的弱势科目的训练数据增加一倍,就可以让最后的结果发生变化。但是,由于 训练数据之间的耦合关系,研究者通过建模量化每个领域数据对于最终结果的 ...