Workflow
Llama3.1 8B
icon
Search documents
一招缓解LLM偏科!调整训练集组成,“秘方”在此 | 上交大&上海AI Lab等
量子位· 2025-06-10 07:35
Core Viewpoint - The IDEAL method proposed by a joint team from Shanghai Jiao Tong University and Shanghai AI Lab significantly enhances the performance of large language models (LLMs) across various domains by adjusting the composition of the supervised fine-tuning (SFT) training dataset [3][4]. Group 1: Methodology - The IDEAL method focuses on preparing high-quality training datasets for different domains and modeling the optimization problem to minimize validation loss [5]. - The quantity of training data during the SFT phase is not the key factor; rather, the appropriate distribution of data is crucial to avoid exacerbating the "偏科" phenomenon in models [6][15]. - The research quantifies the impact of data adjustment on the optimal model's performance in the validation set, providing a theoretical foundation for the IDEAL approach [7]. Group 2: Computational Efficiency - The paper employs K-FAC theory to approximate the inverse of the Hessian matrix, which simplifies the computation and allows for scalability to LLM parameter sizes [8]. Group 3: Experimental Results - The IDEAL method was tested on the Llama 3.1 8B model, demonstrating a significant improvement in coding capabilities after just two iterations, regardless of the epoch [10]. - The initial distribution of training data can be further optimized, as IDEAL consistently improved average results across various benchmarks, regardless of the initial distribution [11]. Group 4: Practical Applications - IDEAL addresses the challenge of how to effectively combine high-quality training data from various domains into a unified training set, thus eliminating the need for manual adjustments [14]. - The paper suggests that the optimal value for the hyperparameter m should be around 0.15, as it balances the need for data distribution optimization without being too aggressive [15].