Core Insights - The evolution of large models is fundamentally a history of combating "forgetting" [1] - The AI infrastructure battlefield post-2026 will increasingly focus on "model memory" [1] - A structured analysis framework for memory layers in AI is proposed, encompassing short-term, mid-term, and long-term memory [8] Short-term Memory - Short-term memory constitutes the "current view" during single inference, characterized by high-frequency read/write and sensitivity to latency [4] - Key challenges include the dual occupation of memory bandwidth and capacity by KV Cache, necessitating software optimizations like PagedAttention and hardware advancements in HBM and on-chip SRAM [4][19] - The physical resource constraints lead to a "memory wall," impacting inference speed and efficiency [19] Mid-term Memory - Mid-term memory ensures contextual continuity across sessions, evolving AI from stateless to a dynamic system capable of "storage-retrieval-update-forget" management [4] - Software advancements like GraphRAG and MemoryOS facilitate proactive memory governance, while hardware needs include large-capacity DRAM and enterprise-grade SSDs to handle high-concurrency random read/write bottlenecks [4][28] - This layer is crucial for defining the upper limits of agent capabilities and constructing private data barriers [4] Long-term Memory - Long-term memory supports the transition from pre-training to "continuous evolution," allowing for real-time updates and knowledge accumulation [5] - Three pathways for achieving long-term memory are identified: implicit parameters, explicit semantics, and parameterized lookup tables [5][46] - The blurring lines between training and inference necessitate hardware capable of supporting both functions, particularly in terms of memory bandwidth and computational power [50][51] Hardware Requirements - Short-term memory demands high bandwidth memory (HBM) and on-chip SRAM to manage the rapid read/write of "hot data" [27] - Mid-term memory requires large-capacity DRAM and enterprise-grade SSDs to optimize storage costs and ensure rapid access [43] - Long-term memory necessitates enterprise-grade SSDs and CPUs with high performance to manage extensive data and high concurrency [54] Software Solutions - The RAG paradigm is evolving from basic retrieval to structured approaches like GraphRAG, enhancing logical reasoning capabilities [32][35] - Memory OS architecture allows agents to actively manage memory lifecycle, ensuring efficient use of memory resources [38] - The introduction of test-time training mechanisms and parameter-efficient fine-tuning (PEFT) enhances the ability to retain valuable information in long-term memory [47][48]
中金 | AI十年展望(二十七):越过“遗忘”的边界,模型记忆的三层架构与产业机遇
中金点睛·2026-02-12 23:36