Workflow
片上SRAM
icon
Search documents
中金 | AI十年展望(二十七):越过“遗忘”的边界,模型记忆的三层架构与产业机遇
中金点睛· 2026-02-12 23:36
中金研究 大模型的演进史,本质上是一部与"遗忘"抗争的历史。 当我们惊叹于模型的推理能力时,往往忽视了一个重要短板: 在缺乏记忆留存的架构下,模型 每一次对历史信息的处理,本质上都是一次昂贵的"重复计算"。 这种以高昂算力对抗遗忘的粗放模式,正面临着显存墙与上下文窗口的物理极限。我 们认为,2026年及之后的AI Infra主战场将增加"模型记忆"这一极。 何为模型记忆?如何理解短期、中期、长期记忆三层记忆系统对应的软硬件需求? 如何对应模型训练、推理、Agent场景理解记忆分层系统?我们将在本报告中予以解答。 点击小程序查看报告原文 Abstract 摘要 短期记忆构成大模 型单 次推理的"当前视野"。 作为高频读写、对延迟极度敏感的"热数据",其核心矛盾在于KV Cache对显存容量与带宽的双重挤占。软 件端通过PagedAttention显存虚拟化与PD分离调度进行优化,并探索出无限注意力(Infini-attention)等前沿架构以支撑百万Tokens上下文窗口。这一逻辑 直接锚定了HBM与片上SRAM作为突破"显存墙"与"延迟墙"的重要硬件要素。 中 期记忆保障跨会话的情景连续性,是Agent的基 ...
英伟达GPU VS谷歌TPU:哪些产业链竞争激烈?:传媒
Huafu Securities· 2026-01-16 13:25
Investment Rating - The industry rating is "Outperform the Market" indicating that the overall industry return is expected to exceed the market benchmark index by more than 5% in the next 6 months [15]. Core Insights - The competition between NVIDIA and Google in the AI chip market is heavily reliant on TSMC's CoWoS advanced packaging, which is currently a critical bottleneck in the AI chip supply chain [3]. - TSMC's capital expenditure for 2026 is projected to be between $52 billion and $56 billion, reflecting a year-on-year growth of 27% to 37% due to strong AI demand [3]. - NVIDIA is collaborating with Amkor to expand its production capacity in the U.S. from 2026 to 2029, as TSMC reallocates some advanced packaging orders to OSAT manufacturers [3]. - Samsung and Intel are actively enhancing their advanced process capabilities, with Samsung aiming to increase its global 2nm monthly capacity to 21,000 wafers by the end of 2026 [4]. - HBM is identified as a key battleground in the competition between NVIDIA's GPUs and Google's TPUs, influencing both performance limits and the actual deliverable quantities of chips [4]. - NAND and SSD demand is significantly amplified in AI data centers, with NVIDIA's Rubin platform enhancing data sharing and reuse, potentially increasing SSD demand [5]. - There is a rising demand for inference cards as large model vendors seek alternatives to NVIDIA's chips to reduce dependency and costs [6]. Summary by Sections Advanced Process and Packaging - TSMC leads in advanced packaging with CoWoS capacity constraints impacting NVIDIA and Google's AI chip output [3]. - Amkor and ASE are being utilized to alleviate TSMC's capacity pressure, with Amkor investing $5 billion in advanced packaging facilities in Arizona [3][4]. Storage Side - HBM is crucial for the competition between NVIDIA and Google, while on-chip SRAM is emerging as a new direction for inference storage [4]. - The collaboration between NVIDIA and Groq focuses on inference technology utilizing on-chip SRAM [4]. Client Side - Major AI model vendors are diversifying their computational resources, with Anthropic planning to deploy up to 1 million TPUs by 2026 and OpenAI partnering with Cerebras for a large-scale AI inference platform [6]. Investment Recommendations - The report suggests focusing on sectors within the semiconductor supply chain, including foundries, advanced packaging, storage, and AI model applications, amidst the competitive landscape between NVIDIA and Google [7].