从少样本到千样本！MachineLearningLM给大模型上下文学习装上「机器学习引擎」

Core Insights - The article discusses the limitations of large language models (LLMs) in in-context learning (ICL) and introduces a new framework called MachineLearningLM that significantly enhances the performance of LLMs in various classification tasks without requiring downstream fine-tuning [2][7][22]. Group 1: Limitations of Existing LLMs - Despite their extensive world knowledge and reasoning capabilities, LLMs struggle with ICL when faced with numerous examples, often plateauing in performance and being sensitive to example order and label biases [2]. - Previous methods relied on limited real task data, which restricted the generalization ability of models to new tasks [7]. Group 2: Innovations of MachineLearningLM - MachineLearningLM introduces a "continue pre-training" framework that allows LLMs to learn from thousands of examples directly through ICL, achieving superior accuracy in binary and multi-class tasks across various fields [2][22]. - The framework utilizes a large synthetic task dataset of over 3 million tasks generated through a structural causal model (SCM), ensuring no overlap with downstream evaluation sets, thus providing a fair assessment of model generalization [7][11]. Group 3: Methodology Enhancements - The research incorporates a two-tier filtering mechanism using Random Forest models to enhance training stability and interpretability, addressing issues of task quality inconsistency [11][12]. - MachineLearningLM employs efficient context example encoding strategies, such as using compact table formats instead of verbose natural language descriptions, which improves data handling and inference efficiency [15][20]. Group 4: Performance Metrics - The model demonstrates a continuous improvement in performance with an increasing number of examples, achieving an average accuracy that surpasses benchmark models like GPT-5-mini by approximately 13 to 16 percentage points in various classification tasks [22][24]. - In MMLU benchmark tests, MachineLearningLM maintains its original conversational and reasoning capabilities while achieving competitive zero-shot and few-shot accuracy rates [24][25]. Group 5: Application Potential - The advancements in multi-sample context learning and numerical modeling capabilities position MachineLearningLM for broader applications in finance, healthcare, and scientific computing [26][28].