DELT（Data Efficacy in LM Training）

Search documents

量子位· 2025-09-06 04:21

Core Viewpoint - The article emphasizes the importance of data organization order in language model training, introducing a new paradigm called DELT (Data Efficacy in LM Training) that enhances model performance without increasing data volume or model size [1][3][11]. Group 1: Data Efficiency vs. Data Efficacy - Data efficiency focuses on improving model training efficiency through data selection, while data efficacy enhances model performance through data organization, which has been largely overlooked [5][6][15]. - The analogy of cooking is used to illustrate that data efficiency is like selecting fresh ingredients, whereas data efficacy is akin to a chef timing the addition of spices to maximize flavor [7]. Group 2: Importance of Data Organization - The sequence of training samples is crucial, as modern language models often undergo limited training cycles, making the order of data presentation significantly impactful [9][10]. - The DELT paradigm aims to fully exploit the potential of training data by introducing data sorting strategies, leading to improved efficiency and efficacy [11][13]. Group 3: DELT Paradigm Components - DELT integrates three core components: data scoring, data selection, and data ordering, where data scoring assigns scores based on attributes like difficulty and quality [19][20]. - A novel folding ordering method is proposed to enhance data efficacy by preventing model forgetting and ensuring balanced data distribution [23][27]. Group 4: Performance Results - The DELT paradigm has shown significant performance improvements across various model sizes and data scales, outperforming conventional methods in multiple evaluation metrics [28]. - For instance, with a model size of 1 billion, DELT achieved an average score of 39.17% compared to 37.77% for conventional methods [28]. Group 5: Implications for AI Training - DELT provides a new perspective for data-centric AI, suggesting that AI training should adopt personalized and structured learning approaches similar to human education practices [29][30].

Microsoft(US:MSFT)

数据训练效能

数据训练效率

Learning - Quality Score（LQS）

Folding Ordering（FO）

人工智能

DELT（Data Efficacy in LM Training）

数据训练效能

数据训练效率

Learning - Quality Score（LQS）

Folding Ordering（FO）

人工智能

DELT（Data Efficacy in LM Training）