MemOS
Search documents
业界首个!记忆张量联手商汤大装置落地国产 PD 分离集群,推理性价比达 A100 的 150%
Xin Lang Cai Jing· 2025-12-05 12:56
Core Insights - The collaboration between Memory Tensor and SenseTime has successfully implemented the first commercial inference cluster based on "memory-computation-scheduling" integration on domestic GPGPU, achieving a 20% increase in single-card concurrency and a 75% increase in throughput, with a cost-performance ratio reaching 150% of the NVIDIA A100 [1][8][6] Group 1: Technological Advancements - The core product MemOS by Memory Tensor is the only memory-centric infrastructure that covers system design from low-level inference to memory models and application engineering, categorizing cognitive structures into three types of memory and forming a scheduling link across time scales [5][9] - The PD separation has transitioned from an optimization technique to a new inference paradigm, allowing for a comprehensive description and measurement of performance in production environments [5][12] Group 2: Performance Metrics - The overall throughput of the cluster improved by over 75%, increasing from 107.85 tokens/s to 189.23 tokens/s, effectively decoupling computation and storage [6][12] - Single-card concurrency capability increased by approximately 20%, from 25.00 concurrent requests per card to 29.42, significantly reducing the risk of queuing and overflow during peak periods [6][12] - The total time to first token (TTFT) remained stable below 2 seconds, with a 70%+ increase in KV Cache hit rate in popular scenarios, enhancing the cost-effectiveness of inference for high-frequency, multi-turn interactions [6][12][13] Group 3: Future Directions - Future collaborations will focus on building a memory-driven pipeline inference foundation on a larger scale of domestic GPGPU clusters, creating observable, reversible, and evolvable infrastructure capabilities [7][14] - The shift from parameter computation to memory computation and from static inference to dynamic pipelines positions domestic GPGPU as a potential leader in defining the next generation of inference paradigms [7][14]
国内外AI大厂重押,初创梭哈,谁能凭「记忆」成为下一个「DeepSeek」?
3 6 Ke· 2025-09-07 09:07
Core Insights - The concept of "memory" in AI is emerging as a crucial factor for the next wave of advancements, allowing models to learn continuously and adapt without forgetting previous knowledge [2][6][22] - Major players in the AI industry are increasingly focusing on integrating memory capabilities into their models, with various approaches being explored [4][24][30] Industry Developments - Companies like Anthropic, Google, and OpenAI have recently announced memory features in their AI systems, enabling more natural and coherent interactions by recalling past conversations [4][6][31] - The introduction of memory capabilities is seen as a response to the limitations of current models, which rely heavily on short-term memory and lack the ability to retain long-term knowledge [3][19][22] Technical Approaches - Different technical routes for implementing memory in AI models are being explored, including parameterized memory, context memory, and external databases [24][26][29] - Parameterized memory aims to allow models to distinguish which information should be retained as memory, enhancing their reasoning capabilities [24][25] - Context memory involves using prompts to provide necessary information before inference, while external databases store information outside the model for retrieval during decision-making [26][27] Competitive Landscape - The AI market is witnessing a competitive race among various players to establish memory capabilities, with established firms and startups alike vying for dominance [30][33] - Companies are adopting different business models based on their memory capabilities, with larger firms focusing on user retention through personalized experiences, while startups aim for a decentralized memory platform [32][33] Future Outlook - The timeline for achieving widespread and effective memory capabilities in AI models is estimated to be one to two years for practical applications, and three to five years for governance and privacy issues [34][35]
国内外AI大厂重押,初创梭哈,谁能凭「记忆」成为下一个「DeepSeek」?
机器之心· 2025-09-07 05:12
Core Viewpoint - The article discusses the emerging importance of "memory" in AI models, suggesting that the ability to possess human-like memory will be a key factor in the next wave of AI advancements [2][6][35]. Group 1: Importance of Memory in AI - The concept of "memory" is evolving from short-term to long-term or lifelong memory, allowing AI to learn continuously and adapt to new tasks without forgetting previous knowledge [3][7]. - Recent developments in AI memory capabilities have been highlighted by major players like Anthropic, Google, ByteDance, and OpenAI, all of which have introduced memory features in their AI systems [4][6][35]. - The demand for memory capabilities is driven by both technical and application needs, as AI models are increasingly expected to function as long-term partners rather than just tools [20][21][23]. Group 2: Current Trends and Developments - Various AI companies are exploring different approaches to implement memory, including parameterized memory, context memory, and external databases [26][28][30]. - The industry is witnessing a surge in interest and investment in memory-related research, with many companies racing to develop and integrate these capabilities into their products [6][35]. - The competition among AI firms is intensifying, with the potential for breakthroughs in memory capabilities to redefine the market landscape, similar to past pivotal moments in AI development [35][36]. Group 3: Future Outlook - The timeline for achieving widespread and effective memory capabilities in AI is estimated to be one to two years for basic functionalities, while addressing governance and privacy issues may take three to five years [36][37]. - The future of AI memory capabilities remains uncertain, with various players in the industry vying for dominance, indicating that any company could emerge as a leader in this space [38].
那天,AI大模型想起了,被「失忆」所束缚的枷锁
机器之心· 2025-08-31 05:33
Core Insights - The article discusses the advancements in memory capabilities of large language models (LLMs), highlighting how companies like Google, OpenAI, and Anthropic are integrating memory features into their AI systems to enhance user interaction and continuity in conversations [1][3][10]. Memory Capabilities of LLMs - Google's Gemini has introduced memory capabilities that allow it to retain information across multiple conversations, making interactions more natural and coherent [1]. - OpenAI's ChatGPT has implemented a memory feature since February 2024, enabling users to instruct the model to remember specific details, which improves its performance over time [3][42]. - Anthropic's Claude has also added memory functionality, allowing it to recall previous discussions when prompted by the user [3][6]. Types of Memory in LLMs - Memory can be categorized into sensory memory, short-term memory, and long-term memory, with a focus on long-term memory for LLMs [16][17]. - Contextual memory is a form of short-term memory where relevant information is included in the model's context window [18]. - External memory involves storing information in an external database, allowing for retrieval during interactions, which is a common method for building long-term memory [22][23]. - Parameterized memory attempts to encode information directly into the model's parameters, providing a deeper form of memory [24][29]. Innovations in Memory Systems - New startups are emerging, focusing on memory systems for AI, such as Letta AI's MemGPT and RockAI's Yan 2.0 Preview, which aim to enhance memory capabilities [11][12]. - The concept of hybrid memory systems is gaining traction, combining different types of memory to improve AI's adaptability and performance [37][38]. Notable Memory Implementations - OpenAI's ChatGPT allows users to manage their memory entries, while Anthropic's Claude retrieves past conversations only when requested [42][44]. - Gemini supports user input for memory management, enhancing its ability to remember user preferences [45]. - The M3-Agent developed by ByteDance, Zhejiang University, and Shanghai Jiao Tong University integrates long-term memory capabilities across multiple modalities, including video and audio [10][70]. Future Trends in AI Memory - The future of AI memory is expected to evolve towards multi-modal and integrated memory systems, allowing for a more comprehensive understanding of user interactions [97][106]. - There is a growing emphasis on creating memory systems that can autonomously manage and optimize their memory, akin to human cognitive processes [101][106]. - The ultimate goal is to develop AI systems that can exhibit unique personalities and emotional connections through their memory capabilities, potentially leading to the emergence of artificial general intelligence (AGI) [109][110].
重塑记忆架构:LLM正在安装「操作系统」
机器之心· 2025-07-16 04:21
Core Viewpoint - The article discusses the limitations of large language models (LLMs) regarding their context window and memory management, emphasizing the need for improved memory systems to enhance their long-term interaction capabilities [5][6][9]. Context Window Evolution - Modern LLMs typically have a limited context window, with early models like GPT-3 handling around 2,048 tokens, while newer models like Meta's Llama 4 Scout claim to manage up to 10 million tokens [2][4]. Memory Management in LLMs - LLMs face an inherent "memory defect" due to their limited context window, which hampers their ability to maintain consistency in long-term interactions [5][6]. - Recent research has focused on memory management systems like MemOS, which treat memory as a critical resource alongside computational power, allowing for continuous updates and self-evolution of LLMs [9][49]. Long Context Processing Capabilities - Long context processing capabilities are crucial for LLMs, encompassing: - Length generalization ability, which allows models to extrapolate on sequences longer than those seen during training [12]. - Efficient attention mechanisms to reduce computational and memory costs [13]. - Information retention ability, which refers to the model's capacity to utilize distant information effectively [14]. - Prompt design to maximize the advantages of long context [15]. Types of Memory in LLMs - Memory can be categorized into: - Event memory, which records past interactions and actions [18]. - Semantic memory, encompassing accessible external knowledge and understanding of the model's capabilities [19]. - Procedural memory, related to the operational structure of the system [20]. Methods to Enhance Memory and Context - Several methods to improve LLM memory and context capabilities include: - Retrieval-augmented generation (RAG), which enhances knowledge retrieval for LLMs [27][28]. - Hierarchical summarization, which recursively summarizes content to manage inputs exceeding model context length [31]. - Sliding window inference, which processes long texts in overlapping segments [32]. Memory System Design - Memory systems in LLMs are akin to databases, integrating lifecycle management and persistent representation capabilities [47][48]. - Recent advancements include the development of memory operating systems like MemOS, which utilize a layered memory architecture to manage short-term, medium-term, and long-term memory [54][52]. Innovative Memory Approaches - New memory systems such as MIRIX and Larimar draw inspiration from human memory structures, enhancing LLMs' ability to update and generalize knowledge rapidly [58][60]. - These systems aim to improve memory efficiency and model inference performance by employing flexible memory mechanisms [44].
重塑AI记忆边界:MemOS开源!时序推理较OpenAI提升159%
机器之心· 2025-07-07 04:48
Core Insights - The article discusses the launch of MemOS, a memory operating system designed for large models, which significantly enhances memory management capabilities, achieving an average accuracy improvement of over 38.97% and reducing token overhead by 60.95% compared to existing frameworks [2][3][4]. Group 1: MemOS Overview - MemOS is developed by Memory Tensor (Shanghai) Technology Co., in collaboration with top universities and organizations, aiming to provide a structured approach to memory management in AI models [3][4]. - The system treats memory as a critical resource, integrating plaintext, activation, and parameter memory into a unified framework, allowing for continuous evolution and self-updating capabilities [4][5]. Group 2: Technical Architecture - MemOS features a layered architecture similar to traditional operating systems, consisting of an API layer, memory scheduling and management layer, and memory storage infrastructure [10][11]. - The memory scheduling paradigm supports context-based next-scene prediction, which anticipates memory needs during model generation, enhancing response speed and inference efficiency [12][13]. Group 3: Application Scenarios - MemOS enables personalized AI agents that can accumulate and manage user preferences, enhancing user experience through continuous interaction [20]. - In research and knowledge management, it allows for structured long-term storage and dynamic retrieval of project materials, improving efficiency and continuity [20]. - The system is designed for high-reliability scenarios, such as finance and law, providing memory traceability and audit capabilities to ensure compliance and transparency [20]. Group 4: Future Development Plans - The MemOS team plans to establish the OpenMem community to foster collaboration in memory management research and applications [44]. - Future iterations will focus on memory representation, distributed scheduling, and cross-model memory transfer, aiming to create a high-availability, low-cost, and secure memory operating system [46][47].