DeepSeekEngram：把“回忆”交给查表，把算力留给推理

Investment Rating - The report does not explicitly state an investment rating for the industry or specific companies involved in the research Core Insights - The Engram model proposed by DeepSeek and Peking University introduces a "Conditional Memory" mechanism that separates static knowledge recall from complex computations, significantly improving computational efficiency and task performance [1][2] - Engram-27B demonstrates systematic improvements over MoE-27B across multiple benchmarks, particularly excelling in long-context tasks [1][3] - The architecture allows for the offloading of large parameter tables to host memory, maintaining controllable inference throughput impact, thus validating the feasibility of "separation of storage and computation" [1][6] Summary by Sections Event - In January 2026, DeepSeek and Peking University released a paper on the Engram model, achieving significant performance improvements in various benchmarks while maintaining computational efficiency [1][17] Commentary - Engram innovatively decouples the recall of fixed knowledge from complex model computations, allowing models to focus on deeper reasoning tasks, thus enhancing overall efficiency [2][18] Performance Optimization - The study reveals an optimization path for resource allocation, indicating that transferring some model capacity to a conditional memory module can lead to a "U-shaped" performance trend, with a clear optimal performance range [3][19] - Replacing approximately 20% of traditional parameter capacity with conditional memory can yield significant improvements in knowledge-intensive tasks [3][19] Long Context Processing - Engram effectively offloads local repetitive details to memory lookup, allowing the backbone network to focus on global information integration, which is crucial for long-text processing [4][20] - In experiments, Engram-27B consumed only about 82% of the baseline pre-training computation while achieving higher accuracy in long-text retrieval tasks [4][20] System-Level Design - Engram's deterministic addressing mechanism allows for data pre-fetching from host memory, alleviating pressure on high-bandwidth memory (HBM) and controlling inference overhead to within 3% even with large memory tables [6][22] - The innovation shifts the focus from GPU memory constraints to CPU memory capacity and interconnect technologies, potentially redefining the critical constraints of AI systems [6][23] Impact on Chinese Large Models - Engram's ability to transfer memory-type parameters to scalable system memory enhances model capabilities while reducing reliance on high-end HBM, providing a clearer path for efficiency-driven technological advancement in China's large model industry [7][24] - The open-sourcing of related papers and code lowers barriers for industry validation and development, facilitating faster deployment and commercialization of large models in cost-sensitive environments [7][26]