Workflow
智能体自进化
icon
Search documents
周靖人署名,通义实验室开源智能体自进化系统:让模型学会“自我反思”,14B也能越级打怪
量子位· 2025-11-19 05:02
Core Insights - The article discusses the launch of AgentEvolver, a self-evolving intelligent agent system developed by Alibaba, which significantly enhances the performance of AI models in complex tasks [2][4]. Performance Improvement - AgentEvolver has improved the average completion rate of a 14B model from 29.8% to 57.6%, nearly doubling its performance [4]. - In a smaller 7B model, the average completion rate increased from 15.8% to 45.2%, demonstrating the framework's versatility across different model sizes [5]. - The system has shown the ability to outperform larger models (e.g., 32B models) in specific tasks after optimization [5]. Learning Efficiency - AgentEvolver exhibits rapid convergence in learning efficiency, requiring significantly fewer training steps to reach 90% of baseline model performance—55.6% fewer steps in AppWorld tasks and 66.7% fewer in BFCL tasks [7][8]. - This efficiency leads to reduced training time and computational costs [8]. Cross-Domain Generalization - Models trained on synthetic data maintain high performance when applied to new, unseen domains, indicating strong cross-domain generalization capabilities [9][11]. - For instance, a model trained on AppWorld tasks performed well on BFCL tasks with minimal performance degradation [10]. Self-Evolution Mechanism - AgentEvolver utilizes a data-exploration-feedback automated process to achieve self-evolution, driven by three core mechanisms: self-questioning, self-navigating, and self-attributing [13][20]. - The self-questioning mechanism allows the system to generate challenging tasks autonomously, breaking reliance on external data [21][23]. - The self-navigating mechanism enhances exploration efficiency by leveraging past experiences to guide current decision-making [24][28]. - The self-attributing mechanism provides fine-grained feedback on each action taken, improving sample efficiency in strategy optimization [30][33].
从 ReasoningBank 到 MetaAgent,RL 未必是 Agent 自进化的必要解?
机器之心· 2025-10-25 02:30
Core Viewpoint - The article discusses the evolution of intelligent agents, emphasizing the importance of memory systems in enabling self-evolution beyond traditional reinforcement learning (RL) methods. It highlights the exploration of various technical directions, including metacognition and self-diagnosis, to enhance the capabilities of intelligent agents. Group 1: Memory Systems and Their Evolution - Recent advancements in artificial intelligence have shifted focus from solely large language models to self-evolving intelligent agents capable of executing complex tasks in dynamic environments [4] - The development of memory systems aims to transform immediate reasoning into cumulative, transferable long-term experiences, allowing agents to remember not just what to think but how to think [7][8] - The evolution of memory systems is categorized into three stages: No Memory Agent, Trajectory Memory, and Workflow Memory, each with its limitations regarding knowledge abstraction and adaptability [8][9] Group 2: ReasoningBank Mechanism - The ReasoningBank mechanism aims to elevate the abstraction level of agent memory from operational records to generalized reasoning strategies, enhancing knowledge readability and transferability across tasks [10] - It operates on a self-aware feedback loop that includes memory retrieval, construction, and integration, facilitating a closed-loop learning process without external supervision [7][10] - The Memory-aware Test-Time Scaling (MaTTS) mechanism optimizes resource allocation to enhance the quality of comparative signals, leading to improved reasoning strategies and faster adaptive evolution of agents [11][12] Group 3: Future Directions in Self-Evolution - While memory system improvements are currently the mainstream approach for enabling self-evolution in AI, researchers are also exploring other technical routes, such as self-recognition and external tool assistance [14]
仅100种子题,合成数据质量超GPT-5,阿里、上交提出Socratic-Zero框架
机器之心· 2025-10-23 07:45
Core Insights - The article discusses the Socratic-Zero framework developed by Alibaba and Shanghai Jiao Tong University, which enables autonomous reasoning training without external data reliance, using only 100 seed questions to generate high-quality, adaptive learning materials [5][14][35] Group 1: Introduction and Background - The current breakthroughs in large language models (LLMs) heavily depend on vast amounts of labeled data, which can lead to inefficiencies in training signals [5] - Socratic-Zero is introduced as a self-evolving training framework that utilizes three intelligent agents: Solver, Teacher, and Generator, to create a dynamic learning environment [9][12] Group 2: Methodology - The Socratic-Zero framework is inspired by Socratic maieutics, emphasizing the importance of high-quality questioning to stimulate self-correction and continuous evolution in AI models [9][12] - The three-agent system operates in a closed-loop self-evolution mechanism, where the Solver's weaknesses drive the Teacher to generate targeted questions, and the Generator learns from the Teacher's strategies to create new problems [13][15] Group 3: Key Innovations - The framework demonstrates significant performance improvements, with the Solver achieving an average accuracy of 56.1% across seven mathematical reasoning benchmarks, a 20.2 percentage point increase compared to previous models [25][32] - The Generator, using only 100 seed questions, produces synthetic data of higher quality than that generated by top closed-source models like GPT-5 and Gemini-2.5-Pro [27][28] Group 4: Experimental Results - The performance of the Solver improved by 15.4 percentage points compared to MetaMath and WizardMath, showcasing the effectiveness of the Socratic-Zero approach [25] - The Generator's question effectiveness reached 95.6%, closely matching GPT-5's performance, indicating the high quality of the generated content [28] Group 5: Engineering and Practicality - Socratic-Zero's training process is designed to be engineering-friendly, ensuring diversity and quality control through multiple validations of seed questions [30][33] - The framework is lightweight and can be implemented with minimal hardware requirements, making it accessible for resource-constrained teams [33][34] Group 6: Future Implications - Socratic-Zero opens a new path for zero-data, self-evolving AI systems, highlighting the potential for intelligent agents to enhance reasoning capabilities without human intervention [35][36]