Workflow
智能体自进化
icon
Search documents
从 ReasoningBank 到 MetaAgent,RL 未必是 Agent 自进化的必要解?
机器之心· 2025-10-25 02:30
Core Viewpoint - The article discusses the evolution of intelligent agents, emphasizing the importance of memory systems in enabling self-evolution beyond traditional reinforcement learning (RL) methods. It highlights the exploration of various technical directions, including metacognition and self-diagnosis, to enhance the capabilities of intelligent agents. Group 1: Memory Systems and Their Evolution - Recent advancements in artificial intelligence have shifted focus from solely large language models to self-evolving intelligent agents capable of executing complex tasks in dynamic environments [4] - The development of memory systems aims to transform immediate reasoning into cumulative, transferable long-term experiences, allowing agents to remember not just what to think but how to think [7][8] - The evolution of memory systems is categorized into three stages: No Memory Agent, Trajectory Memory, and Workflow Memory, each with its limitations regarding knowledge abstraction and adaptability [8][9] Group 2: ReasoningBank Mechanism - The ReasoningBank mechanism aims to elevate the abstraction level of agent memory from operational records to generalized reasoning strategies, enhancing knowledge readability and transferability across tasks [10] - It operates on a self-aware feedback loop that includes memory retrieval, construction, and integration, facilitating a closed-loop learning process without external supervision [7][10] - The Memory-aware Test-Time Scaling (MaTTS) mechanism optimizes resource allocation to enhance the quality of comparative signals, leading to improved reasoning strategies and faster adaptive evolution of agents [11][12] Group 3: Future Directions in Self-Evolution - While memory system improvements are currently the mainstream approach for enabling self-evolution in AI, researchers are also exploring other technical routes, such as self-recognition and external tool assistance [14]
仅100种子题,合成数据质量超GPT-5,阿里、上交提出Socratic-Zero框架
机器之心· 2025-10-23 07:45
本文(共同)第一作者为王少博(上交 AI)、焦政博(上财)。(共同)通讯作者为魏虎(阿里巴巴)和张林峰(上交 AI)。本文其他作者来自阿里巴巴、武 大、浙大等。 最近一篇来自阿里巴巴和上交等单位的 Agent 自进化工作得到了推特大佬们的关注。首先是 Rohan Paul 的两次转发: | Shaobo Wang * B Zhengbo Jiao * a, B, Y Zifan Zhang Q, 6 Yilang Peng 9,6 | | --- | | Xu Ze & Boyu Yang & Wei Wang ª Hu Wei † ¤ Linfeng Zhang † 18 | | a Alibaba Group Holding Limited | 网友对此也高度评价: 让我们看看这篇工作到底是怎么做的? 引言:从 "数据饥渴" 到 "自给自足" 当前大语言模型在数学推理上的突破,高度依赖海量人工标注数据。以 MetaMath 和 WizardMath 为代表的静态增强方法,虽能通过提示工程合成训练样本,但其 生成的问题质量不稳定,且无法动态适配模型能力演进,导致训练信号效率低下。 为突破这一瓶颈 ,阿里巴巴 ...