Workflow
MaTTS
icon
Search documents
「微调已死」再添筹码,谷歌扩展AI自我进化范式,成功经验与失败教训双向学习
3 6 Ke· 2025-10-13 02:37
Core Insights - The recent discussions around "fine-tuning is dead" have gained significant attention in academia, particularly due to a paper from Stanford University, SambaNova, and UC Berkeley introducing a technique called Agentic Context Engineering, which allows language models to self-improve without fine-tuning [1] - Google previously proposed a similar concept called ReasoningBank, which serves as an innovative memory framework for agent systems, enabling them to extract and organize memory items from their own experiences without requiring true labels [1][3] Summary by Sections ReasoningBank Overview - ReasoningBank captures effective strategies from successes and extracts important lessons from failures, abstracting them into actionable principles [1] - The process operates in a closed loop where agents retrieve relevant memories from ReasoningBank to guide their actions on new tasks, continuously evolving and enhancing their strategic capabilities [1][3] Memory Structure and Integration - ReasoningBank consists of structured memory items designed from past experiences, retaining transferable reasoning patterns and strategies [6] - Each memory item includes a title, a brief description, and content detailing reasoning steps, decision rationale, or operational insights, making them comprehensible for humans and usable for machines [6][7] Testing and Performance - Google has conducted extensive experiments on challenging benchmarks, including web browsing and software engineering tasks, demonstrating that ReasoningBank outperforms baseline methods in both effectiveness (up to 34.2% improvement) and efficiency (16.0% reduction in interaction steps) [9][11] - The integration of ReasoningBank with memory-aware test-time extension (MaTTS) has shown to create a strong synergy, enhancing the agent's ability to learn from both successful and failed trajectories [12][13] Experimental Results - The experiments indicate that both parallel and sequential extensions improve performance, with ReasoningBank achieving higher resolve rates compared to models without memory mechanisms [11][13] - The results highlight the effectiveness of ReasoningBank in various tasks, showcasing its potential as a key component in memory-based experience expansion for agents [12][13]
「微调已死」再添筹码,谷歌扩展AI自我进化范式,成功经验与失败教训双向学习
机器之心· 2025-10-12 08:02
Core Insights - The article discusses the concept of "Agentic Context Engineering," which allows language models to self-improve without the need for fine-tuning, drawing attention from the academic community [1] - Google's earlier work on "ReasoningBank" presents a similar idea, focusing on an innovative memory framework for agent systems that extracts and organizes memory items from the agent's own experiences [1][3] Summary by Sections ReasoningBank Overview - ReasoningBank captures effective strategies from successes and important lessons from failures, creating actionable principles in a closed-loop process [1][3] - The framework consists of structured memory items that include a title, description, and content, allowing agents to interact with their environment and build new memory items from past experiences [5][7] Key Components of ReasoningBank - Memory Structure: Memory items are designed from past experiences, abstracting low-level execution details while retaining transferable reasoning patterns [7] - Integration with Agents: Agents equipped with ReasoningBank can draw from a curated pool of transferable strategies to guide decision-making, enhancing adaptability to unseen queries [7] Memory-Aware Test-Time Expansion (MaTTS) - MaTTS integrates ReasoningBank with test-time expansion, generating diverse explorations to provide comparative signals for better memory synthesis [8][9] - Two complementary implementations of MaTTS are introduced: parallel expansion and sequential expansion, enhancing the effectiveness of memory planning [9] Experimental Results - Extensive experiments on challenging benchmarks, including WebArena and SWE-Bench-Verified tasks, show that ReasoningBank outperforms baseline methods with effectiveness improvements of up to 34.2% and a reduction of 16.0% in interaction steps [11] - The results indicate that ReasoningBank significantly enhances both the resolve rate and efficiency compared to models without memory [13][14] Overall Impact - The collaboration between ReasoningBank and MaTTS is highlighted as a key component for memory-based experience expansion, demonstrating superior performance in various tasks [14][15]