Workflow
记忆计算
icon
Search documents
首个大规模记忆平台MemoryLake发布 推动AI基础设施从“数据”迈向“记忆”的关键一步
Zheng Quan Ri Bao Wang· 2026-02-06 11:10
Core Insights - The article discusses the launch of MemoryLake, a multi-modal memory platform by Zhiyuan Technology, marking a significant shift in AI infrastructure from data-centric to memory-centric approaches [1][5] Group 1: Product Overview - MemoryLake integrates deep understanding of multi-modal content, memory storage, and memory computation into a single platform, consisting of the MemoryLake-D1 model, memory engine, and multi-modal storage and computation capabilities [1] - The platform addresses fundamental challenges faced by enterprises in AI implementation, such as difficulties in understanding and integrating multi-modal information, data fragmentation, inaccurate model decisions, high costs of large model usage, and slow responses to large-scale enterprise data [1] Group 2: Technological Advancements - MemoryLake represents a paradigm shift from traditional computing to cognitive computing, focusing on memory rather than data, which is essential for intelligent agent networks [2] - The platform's model can execute complex instructions, significantly reducing the time required for data analysis and report generation from days to minutes, demonstrating its capability in handling complex enterprise data [2] Group 3: Market Impact - MemoryLake's memory computing capabilities are driving new intelligent application paradigms across various industries, including gaming, finance, manufacturing, education, law, and e-commerce [4] - The platform has already served over 1.5 million professional users and 15,000 enterprise clients globally, showcasing its extensive market reach and effectiveness [4] Group 4: Future Vision - The founder of Zhiyuan Technology emphasizes that the future of AI is memory-driven, advocating for a system that accumulates knowledge and enhances reasoning and reflection, rather than merely larger models [5] - The launch of MemoryLake is viewed as a milestone in the evolution of AI technology, signaling the transition to a new era of cognitive computing [5]
业界首个!记忆张量联手商汤大装置落地国产 PD 分离集群,推理性价比达 A100 的 150%
Xin Lang Cai Jing· 2025-12-05 12:56
Core Insights - The collaboration between Memory Tensor and SenseTime has successfully implemented the first commercial inference cluster based on "memory-computation-scheduling" integration on domestic GPGPU, achieving a 20% increase in single-card concurrency and a 75% increase in throughput, with a cost-performance ratio reaching 150% of the NVIDIA A100 [1][8][6] Group 1: Technological Advancements - The core product MemOS by Memory Tensor is the only memory-centric infrastructure that covers system design from low-level inference to memory models and application engineering, categorizing cognitive structures into three types of memory and forming a scheduling link across time scales [5][9] - The PD separation has transitioned from an optimization technique to a new inference paradigm, allowing for a comprehensive description and measurement of performance in production environments [5][12] Group 2: Performance Metrics - The overall throughput of the cluster improved by over 75%, increasing from 107.85 tokens/s to 189.23 tokens/s, effectively decoupling computation and storage [6][12] - Single-card concurrency capability increased by approximately 20%, from 25.00 concurrent requests per card to 29.42, significantly reducing the risk of queuing and overflow during peak periods [6][12] - The total time to first token (TTFT) remained stable below 2 seconds, with a 70%+ increase in KV Cache hit rate in popular scenarios, enhancing the cost-effectiveness of inference for high-frequency, multi-turn interactions [6][12][13] Group 3: Future Directions - Future collaborations will focus on building a memory-driven pipeline inference foundation on a larger scale of domestic GPGPU clusters, creating observable, reversible, and evolvable infrastructure capabilities [7][14] - The shift from parameter computation to memory computation and from static inference to dynamic pipelines positions domestic GPGPU as a potential leader in defining the next generation of inference paradigms [7][14]