Workflow
ReasoningBank
icon
Search documents
从 ReasoningBank 到 MetaAgent,RL 未必是 Agent 自进化的必要解?
机器之心· 2025-10-25 02:30
Core Viewpoint - The article discusses the evolution of intelligent agents, emphasizing the importance of memory systems in enabling self-evolution beyond traditional reinforcement learning (RL) methods. It highlights the exploration of various technical directions, including metacognition and self-diagnosis, to enhance the capabilities of intelligent agents. Group 1: Memory Systems and Their Evolution - Recent advancements in artificial intelligence have shifted focus from solely large language models to self-evolving intelligent agents capable of executing complex tasks in dynamic environments [4] - The development of memory systems aims to transform immediate reasoning into cumulative, transferable long-term experiences, allowing agents to remember not just what to think but how to think [7][8] - The evolution of memory systems is categorized into three stages: No Memory Agent, Trajectory Memory, and Workflow Memory, each with its limitations regarding knowledge abstraction and adaptability [8][9] Group 2: ReasoningBank Mechanism - The ReasoningBank mechanism aims to elevate the abstraction level of agent memory from operational records to generalized reasoning strategies, enhancing knowledge readability and transferability across tasks [10] - It operates on a self-aware feedback loop that includes memory retrieval, construction, and integration, facilitating a closed-loop learning process without external supervision [7][10] - The Memory-aware Test-Time Scaling (MaTTS) mechanism optimizes resource allocation to enhance the quality of comparative signals, leading to improved reasoning strategies and faster adaptive evolution of agents [11][12] Group 3: Future Directions in Self-Evolution - While memory system improvements are currently the mainstream approach for enabling self-evolution in AI, researchers are also exploring other technical routes, such as self-recognition and external tool assistance [14]
腾讯研究院AI速递 20251014
腾讯研究院· 2025-10-13 17:53
Group 1: OpenAI and Chip Partnerships - OpenAI has announced a strategic partnership with Broadcom to deploy 100 billion watts of custom AI chips designed by OpenAI, with deployment starting in the second half of 2026 and completion by the end of 2029 [1] - This marks OpenAI's third significant deal with a chip giant in a month, following a $100 billion investment from NVIDIA and a $60 billion GPU deployment agreement with AMD [1] - Sam Altman revealed that both companies have been designing the new chip over the past 18 months, utilizing OpenAI's own models in the design process, leading to a significant increase in Broadcom's stock price by over 10% after the announcement [1] Group 2: Google Gemini 3.0 Update - Google is set to release Gemini 3.0 on October 22, showcasing impressive front-end development capabilities that can generate web pages, games, and original music with a single click [2] - Gemini 3.0 employs a MoE architecture with over a trillion parameters, activating 15-20 billion parameters per query, and can handle context from 1 million to several million tokens, enabling it to process entire books and codebases [2] - Internal tests indicate that Gemini 3.0 outperformed in front-end tests, including generating 3D pixel art, with a year-on-year growth rate of 46.24% expected by September 2025 [2] Group 3: LiblibAI 2.0 Upgrade - LiblibAI 2.0 has integrated over 10 popular video models and numerous image models, allowing users to complete all AI creative tasks within the platform [3] - The upgrade includes a one-click video effect feature and seamless switching between image generation and video creation, incorporating models like Midjourney V7 and Qwen-image [3] - New asset management and AI toolbox features have been added, providing a comprehensive AI experience for both new and existing users [3] Group 4: Mamba-3 Development - The third generation of Mamba, Mamba-3, has entered blind review for ICLR 2026, featuring innovations such as trapezoidal rule discretization, complex state spaces, and multi-input multi-output design [4][5] - Mamba-3 introduces complex hidden states to handle periodic patterns and parity checks, significantly enhancing arithmetic intensity to fully utilize GPU capabilities [5] - It has shown excellent performance in long-context information retrieval tests, with reduced inference latency, making it suitable for long text processing, real-time interaction, and edge computing applications [5] Group 5: SAM 3 Concept Segmentation - The suspected Meta-developed SAM 3 paper has been submitted to ICLR 2026, achieving prompt concept segmentation (PCS) that allows users to segment matching instances using simple noun phrases or image examples [6] - SAM 3 has demonstrated at least a twofold performance improvement on the SA-Co benchmark, achieving an average precision of 47.0 on the LVIS dataset, surpassing the previous record of 38.5 [6] - It utilizes a dual encoder-decoder transformer architecture, built on a high-quality training dataset containing 4 million unique phrases and 52 million masks, processing over 100 object images in just 30 milliseconds on a single H200 GPU [6] Group 6: Google's ReasoningBank Framework - Google has introduced the ReasoningBank memory framework, which extracts memory items from the successes and failures of agents to form a closed-loop self-evolution system that learns without real labels [7] - The framework incorporates memory-aware testing time expansion (MaTTS) to generate diverse explorations through parallel and sequential setups, enhancing the synthesis of more universal memories [7] - ReasoningBank has shown a 34.2% improvement in effectiveness and a 16.0% reduction in interaction steps in benchmark tests such as WebArena, Mind2Web, and SWE-Bench-Verified [7] Group 7: AI Performance in Astronomy - Recent studies indicate that GPT-5 and Gemini 2.5 Pro achieved gold medal results in the International Olympiad on Astronomy and Astrophysics (IOAA), with GPT-5 scoring an average of 84.2% in theoretical exams [8] - Both models outperformed the best students in theoretical exams, although their accuracy in geometric/spatial problems (49-78%) was notably lower than in physics/mathematics problems (67-91%) [8] - This highlights AI's strong reasoning capabilities not only in mathematics but also in astronomy and astrophysics, approaching top human-level performance across multiple scientific domains [8] Group 8: Unitree G1 Robot Developments - The Unitree G1 robot has demonstrated advanced movements such as aerial flips and kung fu techniques, showcasing its agility and capabilities [10] - Unitree plans to launch a humanoid robot standing 1.8 meters tall in the second half of this year, having applied for nearly 10 patents related to humanoid robots [10] - The domestic robotics industry has seen an average growth rate of 50%-100% in the first half of this year, with algorithm upgrades enabling robots to theoretically perform various dance and martial arts movements [10] Group 9: Apple AI Glasses - Bloomberg reports that Apple's smart glasses may run a full version of visionOS when paired with a Mac and switch to a lightweight mobile interface when connected to an iPhone, with a planned release between 2026 and 2027 [11] - Apple has shifted focus from developing a lighter "Vision Air" headset to smart glasses, directly competing with Meta's Ray-Ban Display [11] - The first generation of the product will not feature a display but will include audio speakers, cameras, voice control, and potential health functionalities, with plans for a multi-tiered product line in the future [11] Group 10: Sam Altman's Insights on AI and Work - Sam Altman stated in a recent interview that AI will change the nature of work but will not eliminate true jobs, suggesting that future work may become easier while human intrinsic motivation remains [12] - Regarding the development of GPT-6, the focus will be on creating smarter models with longer context and better memory capabilities, with Codex already capable of completing full-day tasks [12] - OpenAI currently has 800 million active users weekly, and Altman believes that voice will not be the ultimate form of AI interaction, with the team working on a new voice interaction device that will not be revealed in the short term [12]
「微调已死」再添筹码,谷歌扩展AI自我进化范式,成功经验与失败教训双向学习
3 6 Ke· 2025-10-13 02:37
Core Insights - The recent discussions around "fine-tuning is dead" have gained significant attention in academia, particularly due to a paper from Stanford University, SambaNova, and UC Berkeley introducing a technique called Agentic Context Engineering, which allows language models to self-improve without fine-tuning [1] - Google previously proposed a similar concept called ReasoningBank, which serves as an innovative memory framework for agent systems, enabling them to extract and organize memory items from their own experiences without requiring true labels [1][3] Summary by Sections ReasoningBank Overview - ReasoningBank captures effective strategies from successes and extracts important lessons from failures, abstracting them into actionable principles [1] - The process operates in a closed loop where agents retrieve relevant memories from ReasoningBank to guide their actions on new tasks, continuously evolving and enhancing their strategic capabilities [1][3] Memory Structure and Integration - ReasoningBank consists of structured memory items designed from past experiences, retaining transferable reasoning patterns and strategies [6] - Each memory item includes a title, a brief description, and content detailing reasoning steps, decision rationale, or operational insights, making them comprehensible for humans and usable for machines [6][7] Testing and Performance - Google has conducted extensive experiments on challenging benchmarks, including web browsing and software engineering tasks, demonstrating that ReasoningBank outperforms baseline methods in both effectiveness (up to 34.2% improvement) and efficiency (16.0% reduction in interaction steps) [9][11] - The integration of ReasoningBank with memory-aware test-time extension (MaTTS) has shown to create a strong synergy, enhancing the agent's ability to learn from both successful and failed trajectories [12][13] Experimental Results - The experiments indicate that both parallel and sequential extensions improve performance, with ReasoningBank achieving higher resolve rates compared to models without memory mechanisms [11][13] - The results highlight the effectiveness of ReasoningBank in various tasks, showcasing its potential as a key component in memory-based experience expansion for agents [12][13]
「微调已死」再添筹码,谷歌扩展AI自我进化范式,成功经验与失败教训双向学习
机器之心· 2025-10-12 08:02
Core Insights - The article discusses the concept of "Agentic Context Engineering," which allows language models to self-improve without the need for fine-tuning, drawing attention from the academic community [1] - Google's earlier work on "ReasoningBank" presents a similar idea, focusing on an innovative memory framework for agent systems that extracts and organizes memory items from the agent's own experiences [1][3] Summary by Sections ReasoningBank Overview - ReasoningBank captures effective strategies from successes and important lessons from failures, creating actionable principles in a closed-loop process [1][3] - The framework consists of structured memory items that include a title, description, and content, allowing agents to interact with their environment and build new memory items from past experiences [5][7] Key Components of ReasoningBank - Memory Structure: Memory items are designed from past experiences, abstracting low-level execution details while retaining transferable reasoning patterns [7] - Integration with Agents: Agents equipped with ReasoningBank can draw from a curated pool of transferable strategies to guide decision-making, enhancing adaptability to unseen queries [7] Memory-Aware Test-Time Expansion (MaTTS) - MaTTS integrates ReasoningBank with test-time expansion, generating diverse explorations to provide comparative signals for better memory synthesis [8][9] - Two complementary implementations of MaTTS are introduced: parallel expansion and sequential expansion, enhancing the effectiveness of memory planning [9] Experimental Results - Extensive experiments on challenging benchmarks, including WebArena and SWE-Bench-Verified tasks, show that ReasoningBank outperforms baseline methods with effectiveness improvements of up to 34.2% and a reduction of 16.0% in interaction steps [11] - The results indicate that ReasoningBank significantly enhances both the resolve rate and efficiency compared to models without memory [13][14] Overall Impact - The collaboration between ReasoningBank and MaTTS is highlighted as a key component for memory-based experience expansion, demonstrating superior performance in various tasks [14][15]