Workflow
隐空间长期记忆
icon
Search documents
ICML 2025 | M+框架来了,增加LLM隐空间记忆,不再受上下文窗口限制
机器之心· 2025-07-15 03:20
Core Viewpoint - The article discusses the development of M+, a scalable long-term memory extension framework built on MemoryLLM, which significantly enhances the effective memory span of language models from under 20k tokens to over 160k tokens while maintaining the same GPU memory usage [2][18]. Summary by Sections Background and Motivation - The paper highlights the distinction between context windows and memory, noting that existing memory models have limitations. For instance, models like GPT-4.1, despite supporting up to 1 million tokens, face challenges in local deployment due to increased GPU memory and latency [4][5]. - The industry standard approach, "Token-Level Memory," involves storing historical content in databases or vector stores, which can lead to redundancy, conflict resolution issues, and weak multimodal capabilities [5]. M+ Framework - M+ introduces a long-term memory component to MemoryLLM, allowing for a more human-like information storage method through latent space memory, which is both compressed and end-to-end trainable [6][7]. - The framework incorporates approximately 1.67 billion memory tokens into the 8B Llama3 model, enhancing the model's ability to retain information over longer sequences [8][13]. Memory Management - During the update phase, the last K memory tokens are combined with new information and processed through a transformer, while old tokens are randomly discarded and replaced with new ones [11]. - The design allows for effective memory retention within 50k tokens, with plans to further expand memory capacity beyond the initial 1.67 billion tokens [13]. Retrieval Mechanism - A co-trained retriever is introduced to enhance the extraction capabilities from long-term memory, as initial attempts using attention mechanisms proved limited [16]. - This structure allows the model to achieve an effective memory span of 160k tokens without significantly increasing GPU load, as most memory resides in CPU [18]. Performance and Results - M+ demonstrates superior information retention capabilities on the SQuAD dataset, outperforming previous models and maintaining information even at 160k tokens [20]. - A comparison of GPU memory costs shows M+ to be more efficient than other models, indicating its potential for practical applications [19]. Conclusion - M+ represents a significant advancement in exploring latent space long-term memory, providing a solid technical foundation for future language models with sustained memory capabilities. The company aims to continue researching more efficient storage mechanisms and intelligent retrieval strategies [22].