Workflow
条件记忆(conditional memory)
icon
Search documents
DeepSeek新模型曝光?“MODEL1”现身开源社区
Core Insights - DeepSeek has updated its FlashMLA code on GitHub, revealing the previously undisclosed "MODEL1" identifier, which may indicate a new model distinct from the existing "V32" [3][4] - The company plans to launch an "open source week" in February 2025, gradually releasing five codebases, with Flash MLA being the first project [4] - Flash MLA optimizes memory access and computation processes on Hopper GPUs, significantly enhancing the efficiency of variable-length sequence processing, particularly for large language model inference tasks [4] Company Developments - DeepSeek's upcoming AI model, DeepSeek V4, is expected to be released around the Lunar New Year in February 2025, although the timeline may vary [4] - The V4 model is an iteration of the V3 model released in December 2024, boasting advanced programming capabilities that surpass current leading models like Anthropic's Claude and OpenAI's GPT series [5] - Since January 2026, DeepSeek has published two technical papers introducing a new training method called "optimized residual connections (mHC)" and a biologically inspired "AI memory module (Engram)" [5] Industry Context - The introduction of the Engram module aims to improve knowledge retrieval and general reasoning, addressing inefficiencies in the Transformer architecture [5] - The support from Liang Wenfeng's private equity firm, which has achieved a 56.55% average return in 2025, has bolstered DeepSeek's research and development efforts [5]
DeepSeek发布梁文锋署名新论文
新华网财经· 2026-01-13 03:52
Core Insights - DeepSeek released a new paper titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models" on the evening of the 12th, co-authored with Peking University, featuring Liang Wenfeng [1] - The paper introduces conditional memory, which significantly enhances model performance in knowledge retrieval, reasoning, coding, and mathematical tasks under equal parameters and computational conditions [1] - DeepSeek has also open-sourced a related memory module called Engram [1]
DeepSeek发布梁文锋署名新论文
财联社· 2026-01-13 01:15
Core Insights - DeepSeek released a new paper titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models" on the evening of the 12th, co-authored with Peking University, featuring Liang Wenfeng [1] - The paper introduces conditional memory, which significantly enhances model performance in knowledge retrieval, reasoning, coding, and mathematical tasks under equal parameters and computational conditions [1] - DeepSeek has open-sourced the related memory module called Engram [1]
刚刚,梁文锋署名开源「记忆」模块,DeepSeek V4更细节了
机器之心· 2026-01-13 00:12
Core Insights - DeepSeek has introduced a new research paper titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models," in collaboration with Peking University, focusing on enhancing large language models (LLMs) through a novel approach to memory and computation [1][2]. Group 1: Research Background and Problem Statement - Current large language models primarily utilize Mixture of Experts (MoE) for sparsity, known as "conditional computation," but lack an inherent knowledge retrieval mechanism, leading to inefficient simulation of retrieval behavior [2][8]. - DeepSeek proposes "conditional memory" as a complementary approach to MoE, introducing a new module called Engram to address this limitation [3][8]. Group 2: Engram Module and Its Implementation - The Engram module has been made available on GitHub, allowing for community engagement and further development [4]. - Engram modernizes classic n-gram embeddings to achieve knowledge retrieval in O(1) time complexity, enhancing the efficiency of memory access [8][10]. - The module separates static knowledge storage from dynamic computation processes, enhancing the overall architecture of the Transformer network [12][14]. Group 3: Performance and Efficiency - DeepSeek has expanded Engram to a scale of 27 billion parameters, demonstrating significant performance improvements over pure MoE baseline models under equivalent parameter and FLOPs conditions [10][37]. - Engram has shown notable gains in knowledge retrieval tasks, with improvements such as +3.4 in MMLU and +4.0 in CMMLU, as well as enhanced general reasoning capabilities [10][37]. - The architecture allows for efficient memory access without additional performance overhead, supporting prefetching from host memory during runtime [11][18]. Group 4: Sparsity Distribution and Optimal Allocation - DeepSeek formalized a U-shaped expansion rule to characterize the optimal trade-off between neural computation (MoE) and static memory (Engram) [9][22]. - The research indicates that a balanced allocation of approximately 20%-25% of sparse parameter budget to Engram yields optimal performance, confirming the structural complementarity between the two modules [27][29]. Group 5: Experimental Results - Four models were trained: Dense-4B, MoE-27B, Engram-27B, and Engram-40B, all under identical training conditions [34][35]. - Sparse architectures consistently outperformed the dense model across various benchmarks, with Engram-27B achieving significant improvements over MoE-27B in multiple tasks [37]. - Engram-40B further reduced pre-training loss and improved performance on most benchmarks, indicating that memory capacity has not yet reached saturation [38]. Group 6: Long Context Training - Engram's architecture has been validated for its structural advantages in long-context tasks, demonstrating significant performance gains in global context retention [40][41]. - Controlled experiments revealed that Engram outperforms MoE in complex retrieval tasks, showcasing its inherent architectural superiority [45].