推理上下文存储平台(ICMS)
Search documents
英伟达,收尽天下之存储
新财富· 2026-03-09 08:16
Core Viewpoint - Nvidia is positioning itself to potentially become one of the largest storage companies globally by redefining storage systems for its partners, rather than just producing storage chips [2][3]. Group 1: AI and Storage Evolution - The AI race has shifted focus from sheer computational power to addressing the new bottleneck of memory capacity and bandwidth, particularly for handling large amounts of intermediate states in AI processing [5]. - Nvidia's new Rubin architecture introduces a "context memory storage platform" based on BlueField-4 DPU, which aims to revolutionize the storage industry by creating a new storage layer [7][10]. - The Vera Rubin NVL72 rack features four BlueField-4 DPUs managing a dedicated 150TB context memory pool, which serves as a "warm data" layer between GPU's HBM and traditional cold storage [7][10]. Group 2: Storage Architecture Changes - The new architecture allows for a significant increase in effective memory for each GPU, reaching up to 20TB, which is nearly a 200% increase compared to the previous Blackwell architecture [10]. - The three-tier storage system introduced by Nvidia includes HBM4 for hot data, DRAM for warm data, and the context memory storage platform (ICMS) for efficiently storing large KV caches [16][17]. - The ICMS platform reduces the cost of token generation for MoE models to one-tenth of previous costs and enhances inference performance by five times [20]. Group 3: Market Impact and Future Trends - The transformation of NAND flash from a cold storage solution to a critical component in real-time inference calculations will elevate its value and performance requirements [16]. - The demand for NAND due to the new architecture could lead to a significant increase in the overall NAND market, with Nvidia's deployment potentially adding over 115EB of NAND demand [21]. - The shift in storage dynamics is expected to drive a structural upgrade across the entire storage industry, making NAND storage a core hardware component for AI inference [26][27].