上下文学习(ICL)
Search documents
从少样本到千样本!MachineLearningLM给大模型上下文学习装上「机器学习引擎」
机器之心· 2025-09-16 04:01
Core Insights - The article discusses the limitations of large language models (LLMs) in in-context learning (ICL) and introduces a new framework called MachineLearningLM that significantly enhances the performance of LLMs in various classification tasks without requiring downstream fine-tuning [2][7][22]. Group 1: Limitations of Existing LLMs - Despite their extensive world knowledge and reasoning capabilities, LLMs struggle with ICL when faced with numerous examples, often plateauing in performance and being sensitive to example order and label biases [2]. - Previous methods relied on limited real task data, which restricted the generalization ability of models to new tasks [7]. Group 2: Innovations of MachineLearningLM - MachineLearningLM introduces a "continue pre-training" framework that allows LLMs to learn from thousands of examples directly through ICL, achieving superior accuracy in binary and multi-class tasks across various fields [2][22]. - The framework utilizes a large synthetic task dataset of over 3 million tasks generated through a structural causal model (SCM), ensuring no overlap with downstream evaluation sets, thus providing a fair assessment of model generalization [7][11]. Group 3: Methodology Enhancements - The research incorporates a two-tier filtering mechanism using Random Forest models to enhance training stability and interpretability, addressing issues of task quality inconsistency [11][12]. - MachineLearningLM employs efficient context example encoding strategies, such as using compact table formats instead of verbose natural language descriptions, which improves data handling and inference efficiency [15][20]. Group 4: Performance Metrics - The model demonstrates a continuous improvement in performance with an increasing number of examples, achieving an average accuracy that surpasses benchmark models like GPT-5-mini by approximately 13 to 16 percentage points in various classification tasks [22][24]. - In MMLU benchmark tests, MachineLearningLM maintains its original conversational and reasoning capabilities while achieving competitive zero-shot and few-shot accuracy rates [24][25]. Group 5: Application Potential - The advancements in multi-sample context learning and numerical modeling capabilities position MachineLearningLM for broader applications in finance, healthcare, and scientific computing [26][28].
万字长文!首篇智能体自进化综述:迈向超级人工智能之路
自动驾驶之心· 2025-09-11 23:33
Core Insights - The article discusses the transition from static large language models (LLMs) to self-evolving agents capable of continuous learning and adaptation in dynamic environments, paving the way towards artificial superintelligence (ASI) [3][4][46] - It emphasizes the need for a structured framework to understand and design self-evolving agents, focusing on three fundamental questions: what to evolve, when to evolve, and how to evolve [6][46] Group 1: What to Evolve - Self-evolving agents can improve various components such as models, memory, tools, and architecture over time to enhance performance and adaptability [19][20] - The evolution of these components is crucial for the agent's ability to handle complex tasks and environments effectively [19][20] Group 2: When to Evolve - The article categorizes self-evolution into two time modes: intra-test-time self-evolution, which occurs during task execution, and inter-test-time self-evolution, which happens between tasks [22][23] - Intra-test-time self-evolution allows agents to adapt in real-time to specific challenges, while inter-test-time self-evolution leverages accumulated experiences for future performance improvements [22][23] Group 3: How to Evolve - Self-evolution emphasizes a continuous learning process where agents learn from real-world interactions, seek feedback, and adjust strategies dynamically [26][27] - Various methodologies for self-evolution include reward-based evolution, imitation learning, and population-based approaches, each with distinct feedback types and data sources [29][30] Group 4: Applications and Evaluation - Self-evolving agents have significant potential in various fields, including programming, education, and healthcare, where continuous adaptation is essential [6][34] - Evaluating self-evolving agents presents unique challenges, requiring metrics that capture adaptability, knowledge retention, and long-term generalization capabilities [34][36] Group 5: Future Directions - The article highlights the importance of addressing challenges such as catastrophic forgetting, knowledge transfer, and ensuring safety and controllability in self-evolving agents [40][43] - Future research should focus on developing scalable architectures, dynamic evaluation methods, and personalized agents that can adapt to individual user preferences [38][44]