Workflow
多模态记忆
icon
Search documents
打破学科壁垒!400篇参考文献重磅综述,统一调查「人脑×Agent」记忆系统
具身智能之心· 2026-01-11 03:02
Core Viewpoint - The article discusses a significant review paper titled "AI Meets Brain," which bridges cognitive neuroscience and artificial intelligence, focusing on how human memory mechanisms can inform the development of human-like memory systems in agents [2][6]. Summary by Sections Memory Definition - Memory is redefined as not just data storage but as a cognitive link that connects past experiences with future decisions, involving a two-stage process in the human brain [6]. Perspectives on Memory - From a cognitive neuroscience perspective, memory serves as a bridge between past and future [6]. - For large language models (LLMs), memory exists in three forms: parametric memory, working memory, and explicit external memory [7]. - Agent memory transcends simple storage, functioning as a dynamic cognitive architecture that integrates experiences and environmental feedback [8]. Importance of Memory - Memory plays a crucial role in enhancing agent capabilities by overcoming context window limitations, building long-term personalized profiles, and driving experience-based reasoning [12][13]. Memory Classification - The review categorizes memory based on cognitive neuroscience definitions, distinguishing between short-term and long-term memory, with long-term memory further divided into episodic and semantic memory [15][21]. Memory Storage Mechanisms - Memory storage in the human brain involves dynamic cooperation across brain regions, while agent memory systems are explicitly engineered to optimize data structure selection for computational efficiency [31][32]. Memory Management - Memory management in agents is a continuous process involving extraction, updating, retrieval, and application, contrasting with the static nature of traditional memory systems [33][34]. Future Directions - Future agent memory systems should aim for omni-modal capabilities, integrating various data types beyond text, and facilitating skill transfer across different agents [49][50].
最火、最全的Agent记忆综述,NUS、人大、复旦、北大等联合出品
机器之心· 2025-12-22 09:55
Core Insights - The article discusses the evolution of memory systems in AI agents, emphasizing the transition from optional modules to essential infrastructure for various applications such as conversational assistants and code engineering [2] - A comprehensive survey titled "Memory in the Age of AI Agents: A Survey" has been published by leading academic institutions to provide a unified perspective on the rapidly expanding yet fragmented concept of "Agent Memory" [2] Forms of Memory - The survey categorizes agent memory into three main forms: token-level, parametric, and latent memory, focusing on how information is represented, stored, and accessed [16][24] - Token-level memory is defined as persistent, discrete units that are externally accessible and modifiable, making it the most explicit form of memory [18] - Parametric memory involves storing information within model parameters, which can lead to challenges in retrieval and updating due to its flat structure [22] - Latent memory exists in the model's internal states and can be continuously updated during inference, providing a compact representation of memory [24][26] Functions of Memory - The article identifies three core functions of agent memory: factual memory, experiential memory, and working memory [29] - Factual memory aims to provide a stable reference for knowledge acquired from user interactions and environmental facts, ensuring consistency across sessions [31] - Experiential memory focuses on accumulating knowledge from past interactions to enhance problem-solving capabilities, allowing agents to learn from experiences [32] - Working memory manages information within single task instances, addressing the challenge of processing large and complex inputs [35] Dynamics of Memory - The dynamics of memory encompass three stages: formation, evolution, and retrieval, which form a feedback loop [38] - The formation stage encodes raw context into more compact knowledge representations, addressing computational and memory constraints [40] - The evolution stage integrates new memories with existing ones, ensuring coherence and efficiency through mechanisms like pruning and conflict resolution [43] - The retrieval stage determines how memory can assist in decision-making, emphasizing the importance of effective querying strategies [41] Future Directions - The article suggests that future memory systems should be viewed as a core capability of agents rather than mere retrieval plugins, integrating memory management into decision-making processes [49][56] - There is a growing emphasis on automating memory management, allowing agents to self-manage their memory operations, which could lead to more robust and adaptable systems [54][62] - The integration of reinforcement learning into memory control is highlighted as a potential pathway for developing more sophisticated memory systems that can learn and optimize over time [58][60]
给Agent装上“海马体”!上海AILab开源MemVerse,定义多模态记忆新范式
量子位· 2025-12-16 11:52
Core Insights - The article emphasizes the need for a multi-modal memory system for AI agents, moving beyond traditional text-based memory to a more complex, experience-based memory framework [1][2][4] Group 1: Multi-Modal Memory Framework - MemVerse is introduced as the first general multi-modal memory framework for AI agents, integrating images, audio, and video with text into a unified semantic space [1][4] - The framework features a "dual-path" architecture and "memory distillation" technology, enabling AI agents to possess lifelong memory capabilities that are responsive and adaptable [1][4][10] Group 2: Performance and Efficiency - MemVerse has demonstrated significant performance improvements in benchmark tests, such as a nearly 9 percentage point increase in the ScienceQA score for GPT-4o-mini, from 76.82 to 85.48 [8] - In video retrieval tasks, MemVerse outperformed traditional methods like CLIP (29.7% recall rate) and specialized models such as ExCae (67.7%) and VAST (63.9%) [8] - The system can reduce token consumption by up to 90% while maintaining high accuracy, significantly lowering operational costs and delays for long-term memory [8][9] Group 3: Memory Architecture - MemVerse's architecture mimics human cognitive processes, consisting of a central coordinator, short-term memory (STM), and long-term memory (LTM) [6][11] - The central coordinator actively manages memory interactions, enhancing the agent's ability to make intelligent decisions rather than relying on passive data retrieval [11] - The LTM is structured into core memory (user profiles), situational memory (event timelines), and semantic memory (abstract concepts), facilitating deep associative reasoning and addressing "hallucination" issues [11] Group 4: Open Source and Community Engagement - The project has been open-sourced by the Shanghai Artificial Intelligence Laboratory, inviting developers to experiment with the framework [12]
28场锦秋小饭桌的沉淀:产品、用户、技术,AI创业者的三重命题
锦秋集· 2025-09-03 01:32
Core Insights - The article discusses the ongoing series of closed-door social events called "Jinqiu Dinner Table," aimed at AI entrepreneurs, where participants share genuine experiences and insights without the usual corporate formalities [1][3]. Group 1: Event Overview - The "Jinqiu Dinner Table" has hosted 28 events since its inception in late February, bringing together top entrepreneurs and tech innovators to discuss real challenges and decision-making processes in a relaxed setting [1]. - The events are held weekly in major cities like Beijing, Shenzhen, Shanghai, and Hangzhou, focusing on authentic exchanges rather than formal presentations [1]. Group 2: AI Entrepreneur Insights - Recent discussions at the dinner table have highlighted the anxieties and breakthroughs faced by AI entrepreneurs, emphasizing the need for collaboration and shared learning [1]. - Notable participants include leaders from various AI sectors, contributing diverse perspectives on the industry's challenges and opportunities [1]. Group 3: Technological Developments - The article outlines advancements in multi-modal AI applications, discussing the integration of hardware and software to enhance user experience and data collection [18][20]. - Key topics include the importance of first-person data capture through wearable devices, which can significantly improve AI's understanding of user interactions [20][21]. Group 4: Memory and Data Management - Multi-modal memory systems are being developed to create cohesive narratives from disparate data types, enhancing the efficiency of information retrieval and user interaction [22][24]. - Techniques for data compression and retrieval are being refined to allow for more effective use of multi-modal data, which is crucial for AI applications [24][25]. Group 5: Future Directions - The article suggests that the future of AI will involve more integrated and user-friendly systems, with a focus on emotional engagement and social interaction [33]. - There is potential for new platforms to emerge from innovative content consumption methods, emphasizing the need for proof of concept before scaling [34][36].
那天,AI大模型想起了,被「失忆」所束缚的枷锁
机器之心· 2025-08-31 05:33
Core Insights - The article discusses the advancements in memory capabilities of large language models (LLMs), highlighting how companies like Google, OpenAI, and Anthropic are integrating memory features into their AI systems to enhance user interaction and continuity in conversations [1][3][10]. Memory Capabilities of LLMs - Google's Gemini has introduced memory capabilities that allow it to retain information across multiple conversations, making interactions more natural and coherent [1]. - OpenAI's ChatGPT has implemented a memory feature since February 2024, enabling users to instruct the model to remember specific details, which improves its performance over time [3][42]. - Anthropic's Claude has also added memory functionality, allowing it to recall previous discussions when prompted by the user [3][6]. Types of Memory in LLMs - Memory can be categorized into sensory memory, short-term memory, and long-term memory, with a focus on long-term memory for LLMs [16][17]. - Contextual memory is a form of short-term memory where relevant information is included in the model's context window [18]. - External memory involves storing information in an external database, allowing for retrieval during interactions, which is a common method for building long-term memory [22][23]. - Parameterized memory attempts to encode information directly into the model's parameters, providing a deeper form of memory [24][29]. Innovations in Memory Systems - New startups are emerging, focusing on memory systems for AI, such as Letta AI's MemGPT and RockAI's Yan 2.0 Preview, which aim to enhance memory capabilities [11][12]. - The concept of hybrid memory systems is gaining traction, combining different types of memory to improve AI's adaptability and performance [37][38]. Notable Memory Implementations - OpenAI's ChatGPT allows users to manage their memory entries, while Anthropic's Claude retrieves past conversations only when requested [42][44]. - Gemini supports user input for memory management, enhancing its ability to remember user preferences [45]. - The M3-Agent developed by ByteDance, Zhejiang University, and Shanghai Jiao Tong University integrates long-term memory capabilities across multiple modalities, including video and audio [10][70]. Future Trends in AI Memory - The future of AI memory is expected to evolve towards multi-modal and integrated memory systems, allowing for a more comprehensive understanding of user interactions [97][106]. - There is a growing emphasis on creating memory systems that can autonomously manage and optimize their memory, akin to human cognitive processes [101][106]. - The ultimate goal is to develop AI systems that can exhibit unique personalities and emotional connections through their memory capabilities, potentially leading to the emergence of artificial general intelligence (AGI) [109][110].