记忆计算
Search documents
业界首个!记忆张量联手商汤大装置落地国产 PD 分离集群,推理性价比达 A100 的 150%
Xin Lang Cai Jing· 2025-12-05 12:56
Core Insights - The collaboration between Memory Tensor and SenseTime has successfully implemented the first commercial inference cluster based on "memory-computation-scheduling" integration on domestic GPGPU, achieving a 20% increase in single-card concurrency and a 75% increase in throughput, with a cost-performance ratio reaching 150% of the NVIDIA A100 [1][8][6] Group 1: Technological Advancements - The core product MemOS by Memory Tensor is the only memory-centric infrastructure that covers system design from low-level inference to memory models and application engineering, categorizing cognitive structures into three types of memory and forming a scheduling link across time scales [5][9] - The PD separation has transitioned from an optimization technique to a new inference paradigm, allowing for a comprehensive description and measurement of performance in production environments [5][12] Group 2: Performance Metrics - The overall throughput of the cluster improved by over 75%, increasing from 107.85 tokens/s to 189.23 tokens/s, effectively decoupling computation and storage [6][12] - Single-card concurrency capability increased by approximately 20%, from 25.00 concurrent requests per card to 29.42, significantly reducing the risk of queuing and overflow during peak periods [6][12] - The total time to first token (TTFT) remained stable below 2 seconds, with a 70%+ increase in KV Cache hit rate in popular scenarios, enhancing the cost-effectiveness of inference for high-frequency, multi-turn interactions [6][12][13] Group 3: Future Directions - Future collaborations will focus on building a memory-driven pipeline inference foundation on a larger scale of domestic GPGPU clusters, creating observable, reversible, and evolvable infrastructure capabilities [7][14] - The shift from parameter computation to memory computation and from static inference to dynamic pipelines positions domestic GPGPU as a potential leader in defining the next generation of inference paradigms [7][14]