DELT(Data Efficacy in LM Training)

Search documents
调整训练数据出场顺序,大模型就能变聪明!无需扩大模型/数据规模
量子位· 2025-09-06 04:21
Core Viewpoint - The article emphasizes the importance of data organization order in language model training, introducing a new paradigm called DELT (Data Efficacy in LM Training) that enhances model performance without increasing data volume or model size [1][3][11]. Group 1: Data Efficiency vs. Data Efficacy - Data efficiency focuses on improving model training efficiency through data selection, while data efficacy enhances model performance through data organization, which has been largely overlooked [5][6][15]. - The analogy of cooking is used to illustrate that data efficiency is like selecting fresh ingredients, whereas data efficacy is akin to a chef timing the addition of spices to maximize flavor [7]. Group 2: Importance of Data Organization - The sequence of training samples is crucial, as modern language models often undergo limited training cycles, making the order of data presentation significantly impactful [9][10]. - The DELT paradigm aims to fully exploit the potential of training data by introducing data sorting strategies, leading to improved efficiency and efficacy [11][13]. Group 3: DELT Paradigm Components - DELT integrates three core components: data scoring, data selection, and data ordering, where data scoring assigns scores based on attributes like difficulty and quality [19][20]. - A novel folding ordering method is proposed to enhance data efficacy by preventing model forgetting and ensuring balanced data distribution [23][27]. Group 4: Performance Results - The DELT paradigm has shown significant performance improvements across various model sizes and data scales, outperforming conventional methods in multiple evaluation metrics [28]. - For instance, with a model size of 1 billion, DELT achieved an average score of 39.17% compared to 37.77% for conventional methods [28]. Group 5: Implications for AI Training - DELT provides a new perspective for data-centric AI, suggesting that AI training should adopt personalized and structured learning approaches similar to human education practices [29][30].