Workflow
CXL
icon
Search documents
榨干GPU性能,中兴Mariana(马里亚纳)突破显存壁垒
量子位· 2025-08-26 05:46
Core Insights - The article discusses the challenges of expanding Key-Value Cache (KV Cache) storage in large language models (LLMs), highlighting the conflict between reasoning efficiency and memory cost [1] - It emphasizes the need for innovative solutions to enhance KV Cache storage without compromising performance [1] Industry Exploration - Nvidia's Dynamo project implements a multi-level caching algorithm for storage systems, but faces complexities in data migration and latency issues [2] - Microsoft's LMCahce system is compatible with inference frameworks but has limitations in distributed storage support and space capacity [3] - Alibaba proposed a remote storage solution extending KV Cache to Tair database, which offers easy scalability but struggles with low-latency requirements for LLM inference [3] Emerging Technologies - CXL (Compute Express Link) is presented as a promising high-speed interconnect technology that could alleviate memory bottlenecks in AI and high-performance computing [5] - Research on using CXL to accelerate LLM inference is still limited, indicating a significant opportunity for exploration [5] Mariana Exploration - ZTE Corporation and East China Normal University introduced a distributed shared KV storage technology named Mariana, which is designed for high-performance distributed KV indexing [6] - Mariana's architecture is tailored for GPU and KV Cache storage, achieving 1.7 times higher throughput and 23% lower tail latency compared to existing solutions [6] Key Innovations of Mariana - The Multi-Slot lock-based Concurrency Scheme (MSCS) allows fine-grained concurrency control at the entry level, significantly reducing contention and improving throughput [8] - Tailored Leaf Node (TLN) design optimizes data layout for faster access, enhancing read speeds by allowing simultaneous loading of key arrays into SIMD registers [10] - An adaptive caching strategy using Count-Min Sketch algorithm identifies and caches hot data efficiently, improving read performance [11] Application Validation - Mariana's architecture supports large-capacity storage by distributing data across remote memory pools, theoretically allowing unlimited storage space [13] - Experimental results indicate that Mariana significantly improves read/write throughput and latency performance in KV Cache scenarios [14] Future Prospects - Mariana's design is compatible with future CXL hardware, allowing seamless migration and utilization of CXL's advantages [18] - The advancements in Mariana and CXL technology could lead to efficient operation of large models on standard hardware, democratizing AI capabilities across various applications [18]
海力士,加速发展CXL
半导体芯闻· 2025-04-23 10:02
如果您希望可以时常见面,欢迎标星收藏哦~ 来源:内容 编译自 chosun ,谢谢。 SK海力士23日宣布,已完成其基于CXL 2.0的DRAM解决方案CMM(CXL Memory Module)- DDR5 96GB(千兆字节)产品的客户认证。 https://biz.chosun.com/it-science/ict/2025/04/23/4VTOCIILZNFAFNOKZWQ6477UUM/ 点这里加关注,锁定更多原创内容 *免责声明:文章内容系作者个人观点,半导体芯闻转载仅为了传达一种不同的观点,不代表半导体芯闻对该 观点赞同或支持,如果有任何异议,欢迎联系我们。 CXL 是 一 种 下 一 代 解 决 方 案 , 通 过 有 效 连 接 计 算 系 统 内 的 中 央 处 理 器 (CPU) 、 图 形 处 理 单 元 (GPU) 和内存,支持大容量、超高速计算。它基于 PCIe 接口,具有池化功能,可实现快速的数据 传输速度和高效的内存利用率。 SK海力士表示,"如果将该产品应用于服务器系统,与现有的DDR5模块相比,容量将增加50%, 产品本身的带宽也将扩大30%,使其每秒能够处理36GB的数据 ...
海力士,加速发展CXL
半导体芯闻· 2025-04-23 10:02
如果您希望可以时常见面,欢迎标星收藏哦~ 来源:内容 编译自 chosun ,谢谢。 https://biz.chosun.com/it-science/ict/2025/04/23/4VTOCIILZNFAFNOKZWQ6477UUM/ 点这里加关注,锁定更多原创内容 *免责声明:文章内容系作者个人观点,半导体芯闻转载仅为了传达一种不同的观点,不代表半导体芯闻对该 观点赞同或支持,如果有任何异议,欢迎联系我们。 SK海力士23日宣布,已完成其基于CXL 2.0的DRAM解决方案CMM(CXL Memory Module)- DDR5 96GB(千兆字节)产品的客户认证。 CXL 是 一 种 下 一 代 解 决 方 案 , 通 过 有 效 连 接 计 算 系 统 内 的 中 央 处 理 器 (CPU) 、 图 形 处 理 单 元 (GPU) 和内存,支持大容量、超高速计算。它基于 PCIe 接口,具有池化功能,可实现快速的数据 传输速度和高效的内存利用率。 SK海力士表示,"如果将该产品应用于服务器系统,与现有的DDR5模块相比,容量将增加50%, 产品本身的带宽也将扩大30%,使其每秒能够处理36GB的数据 ...