Workflow
片上SRAM
icon
Search documents
中金 | AI十年展望(二十七):越过“遗忘”的边界,模型记忆的三层架构与产业机遇
中金点睛· 2026-02-12 23:36
Core Insights - The evolution of large models is fundamentally a history of combating "forgetting" [1] - The AI infrastructure battlefield post-2026 will increasingly focus on "model memory" [1] - A structured analysis framework for memory layers in AI is proposed, encompassing short-term, mid-term, and long-term memory [8] Short-term Memory - Short-term memory constitutes the "current view" during single inference, characterized by high-frequency read/write and sensitivity to latency [4] - Key challenges include the dual occupation of memory bandwidth and capacity by KV Cache, necessitating software optimizations like PagedAttention and hardware advancements in HBM and on-chip SRAM [4][19] - The physical resource constraints lead to a "memory wall," impacting inference speed and efficiency [19] Mid-term Memory - Mid-term memory ensures contextual continuity across sessions, evolving AI from stateless to a dynamic system capable of "storage-retrieval-update-forget" management [4] - Software advancements like GraphRAG and MemoryOS facilitate proactive memory governance, while hardware needs include large-capacity DRAM and enterprise-grade SSDs to handle high-concurrency random read/write bottlenecks [4][28] - This layer is crucial for defining the upper limits of agent capabilities and constructing private data barriers [4] Long-term Memory - Long-term memory supports the transition from pre-training to "continuous evolution," allowing for real-time updates and knowledge accumulation [5] - Three pathways for achieving long-term memory are identified: implicit parameters, explicit semantics, and parameterized lookup tables [5][46] - The blurring lines between training and inference necessitate hardware capable of supporting both functions, particularly in terms of memory bandwidth and computational power [50][51] Hardware Requirements - Short-term memory demands high bandwidth memory (HBM) and on-chip SRAM to manage the rapid read/write of "hot data" [27] - Mid-term memory requires large-capacity DRAM and enterprise-grade SSDs to optimize storage costs and ensure rapid access [43] - Long-term memory necessitates enterprise-grade SSDs and CPUs with high performance to manage extensive data and high concurrency [54] Software Solutions - The RAG paradigm is evolving from basic retrieval to structured approaches like GraphRAG, enhancing logical reasoning capabilities [32][35] - Memory OS architecture allows agents to actively manage memory lifecycle, ensuring efficient use of memory resources [38] - The introduction of test-time training mechanisms and parameter-efficient fine-tuning (PEFT) enhances the ability to retain valuable information in long-term memory [47][48]
英伟达GPU VS谷歌TPU:哪些产业链竞争激烈?:传媒
Huafu Securities· 2026-01-16 13:25
Investment Rating - The industry rating is "Outperform the Market" indicating that the overall industry return is expected to exceed the market benchmark index by more than 5% in the next 6 months [15]. Core Insights - The competition between NVIDIA and Google in the AI chip market is heavily reliant on TSMC's CoWoS advanced packaging, which is currently a critical bottleneck in the AI chip supply chain [3]. - TSMC's capital expenditure for 2026 is projected to be between $52 billion and $56 billion, reflecting a year-on-year growth of 27% to 37% due to strong AI demand [3]. - NVIDIA is collaborating with Amkor to expand its production capacity in the U.S. from 2026 to 2029, as TSMC reallocates some advanced packaging orders to OSAT manufacturers [3]. - Samsung and Intel are actively enhancing their advanced process capabilities, with Samsung aiming to increase its global 2nm monthly capacity to 21,000 wafers by the end of 2026 [4]. - HBM is identified as a key battleground in the competition between NVIDIA's GPUs and Google's TPUs, influencing both performance limits and the actual deliverable quantities of chips [4]. - NAND and SSD demand is significantly amplified in AI data centers, with NVIDIA's Rubin platform enhancing data sharing and reuse, potentially increasing SSD demand [5]. - There is a rising demand for inference cards as large model vendors seek alternatives to NVIDIA's chips to reduce dependency and costs [6]. Summary by Sections Advanced Process and Packaging - TSMC leads in advanced packaging with CoWoS capacity constraints impacting NVIDIA and Google's AI chip output [3]. - Amkor and ASE are being utilized to alleviate TSMC's capacity pressure, with Amkor investing $5 billion in advanced packaging facilities in Arizona [3][4]. Storage Side - HBM is crucial for the competition between NVIDIA and Google, while on-chip SRAM is emerging as a new direction for inference storage [4]. - The collaboration between NVIDIA and Groq focuses on inference technology utilizing on-chip SRAM [4]. Client Side - Major AI model vendors are diversifying their computational resources, with Anthropic planning to deploy up to 1 million TPUs by 2026 and OpenAI partnering with Cerebras for a large-scale AI inference platform [6]. Investment Recommendations - The report suggests focusing on sectors within the semiconductor supply chain, including foundries, advanced packaging, storage, and AI model applications, amidst the competitive landscape between NVIDIA and Google [7].