Workflow
大模型自我更新
icon
Search documents
0人工参与实现梯度更新,,MIT新框架让AI自动生成微调数据,权重自主升级
3 6 Ke· 2025-10-14 07:16
Core Insights - MIT has introduced a new reinforcement learning framework called SEAL (Self-Adapting LLMs), enabling models to generate fine-tuning data and self-update instructions autonomously, allowing for model weight updates without human intervention [1][3] Group 1: SEAL Framework Overview - SEAL employs a nested learning mechanism that calculates rewards based on the updated model's performance on tasks, optimizing the generation strategy for self-update instructions [3] - The framework provides large models with self-driven update capabilities at the weight level, overcoming the limitations of relying solely on external supervised data [3] Group 2: Knowledge Incorporation Experiment - In the knowledge incorporation experiment, the Qwen2.5-7B model was tested using the SQuAD dataset, where the model generated training data based on new paragraphs without seeing the corresponding answers [5] - The accuracy of the Qwen original model was 32.7%, which improved to 33.5% with original fine-tuning, 46.3% with GPT-4.1 synthetic data, and reached 47.0% using the SEAL method, demonstrating superior knowledge integration capabilities [6][10] Group 3: Large-Scale Data Testing - SEAL achieved an accuracy of 58.2% when tested with longer paragraphs, significantly outperforming the unoptimized version, indicating its ability to generalize to larger data organization tasks [8] Group 4: Few-Shot Learning Experiment - In the few-shot learning experiment, the LLaMA-3.2-1B-Instruct model was used with a subset of tasks from the ARC-AGI dataset, where SEAL generated a training configuration and executed LoRA fine-tuning [11][13] - The success rate of tasks trained with SEAL reached 72.5%, far exceeding the 0% success rate of fixed few-shot prompts and 20% of random sampling strategies, showcasing SEAL's strong task adaptation ability [15][16] Group 5: SEAL's Operational Mechanism - SEAL operates through a dual-loop system that automatically generates training instructions, allowing the model to read new information, rewrite it in its own language, and perform gradient updates for self-learning [17][18] - The outer loop generates self-edit instructions based on new input, while the inner loop executes fine-tuning according to these instructions, constructing synthetic training data and updating weights [18][20] - SEAL utilizes a non-traditional reinforcement learning method called ReSTEM, which focuses on behavior cloning and filtered sampling to optimize the generation of effective self-edit strategies [20]