条件记忆
Search documents
DeepSeekEngram:把“回忆”交给查表,把算力留给推理
Haitong Securities International· 2026-01-27 08:50
Investment Rating - The report does not explicitly state an investment rating for the industry or specific companies involved in the research Core Insights - The Engram model proposed by DeepSeek and Peking University introduces a "Conditional Memory" mechanism that separates static knowledge recall from complex computations, significantly improving computational efficiency and task performance [1][2] - Engram-27B demonstrates systematic improvements over MoE-27B across multiple benchmarks, particularly excelling in long-context tasks [1][3] - The architecture allows for the offloading of large parameter tables to host memory, maintaining controllable inference throughput impact, thus validating the feasibility of "separation of storage and computation" [1][6] Summary by Sections Event - In January 2026, DeepSeek and Peking University released a paper on the Engram model, achieving significant performance improvements in various benchmarks while maintaining computational efficiency [1][17] Commentary - Engram innovatively decouples the recall of fixed knowledge from complex model computations, allowing models to focus on deeper reasoning tasks, thus enhancing overall efficiency [2][18] Performance Optimization - The study reveals an optimization path for resource allocation, indicating that transferring some model capacity to a conditional memory module can lead to a "U-shaped" performance trend, with a clear optimal performance range [3][19] - Replacing approximately 20% of traditional parameter capacity with conditional memory can yield significant improvements in knowledge-intensive tasks [3][19] Long Context Processing - Engram effectively offloads local repetitive details to memory lookup, allowing the backbone network to focus on global information integration, which is crucial for long-text processing [4][20] - In experiments, Engram-27B consumed only about 82% of the baseline pre-training computation while achieving higher accuracy in long-text retrieval tasks [4][20] System-Level Design - Engram's deterministic addressing mechanism allows for data pre-fetching from host memory, alleviating pressure on high-bandwidth memory (HBM) and controlling inference overhead to within 3% even with large memory tables [6][22] - The innovation shifts the focus from GPU memory constraints to CPU memory capacity and interconnect technologies, potentially redefining the critical constraints of AI systems [6][23] Impact on Chinese Large Models - Engram's ability to transfer memory-type parameters to scalable system memory enhances model capabilities while reducing reliance on high-end HBM, providing a clearer path for efficiency-driven technological advancement in China's large model industry [7][24] - The open-sourcing of related papers and code lowers barriers for industry validation and development, facilitating faster deployment and commercialization of large models in cost-sensitive environments [7][26]
大摩眼中的DeepSeek:以存代算、以少胜多
3 6 Ke· 2026-01-22 09:09
Core Insights - DeepSeek is revolutionizing AI scalability by utilizing a hybrid architecture that replaces scarce high-bandwidth memory (HBM) with more cost-effective DRAM through an innovative module called "Engram" [1][3][5] Group 1: Engram Module and Conditional Memory - The Engram module introduces "Conditional Memory," separating static knowledge storage from dynamic reasoning, which significantly reduces reliance on expensive HBM [3][5] - This architecture allows for efficient retrieval of basic information without overloading HBM, thus freeing up capacity for more complex reasoning tasks [3][5] Group 2: Economic Impact on Infrastructure - The Engram architecture reshapes hardware cost structures by minimizing HBM dependency, potentially shifting infrastructure costs from GPUs to more affordable DRAM [5][6] - A 100 billion parameter Engram model requires approximately 200GB of system DRAM, indicating a 13% increase in the use of commodity DRAM per system [5][6] Group 3: Innovation Driven by Constraints - Despite limitations in advanced computing power and hardware access, Chinese AI models have rapidly closed the performance gap with global leaders, demonstrating "constraint-induced innovation" [6][7] - DeepSeek's advancements suggest that future AI capabilities may rely more on algorithmic and system-level innovations rather than merely increasing hardware resources [6][7] Group 4: Future Outlook - The upcoming DeepSeek V4 model is expected to achieve significant advancements in encoding and reasoning, potentially running on consumer-grade hardware like the RTX 5090 [7] - This development could lower the marginal costs of high-level AI inference, enabling broader deployment of AI applications without the need for expensive data center-grade GPU clusters [7]
大摩眼中的DeepSeek:以存代算、以少胜多!
硬AI· 2026-01-22 07:34
Core Viewpoint - DeepSeek is redefining the AI scaling paradigm by emphasizing a "doing more with less" philosophy, where the next generation of AI success relies on efficient hybrid architectures rather than merely stacking more GPUs [2][3][4]. Group 1: Engram Module and Conditional Memory - DeepSeek's innovative Engram module separates storage from computation, significantly reducing the need for expensive high-bandwidth memory (HBM) by utilizing cost-effective DRAM for complex reasoning tasks [3][9]. - The introduction of "Conditional Memory" allows for efficient retrieval of static knowledge stored in DRAM, enhancing the performance of large language models (LLMs) without overloading HBM [9][12]. Group 2: Economic Impact on Infrastructure - The Engram architecture reshapes the hardware cost structure by minimizing reliance on HBM, suggesting a shift in infrastructure costs from GPUs to more affordable memory solutions [12][13]. - The analysis indicates that a 100 billion parameter Engram model would require approximately 200GB of system DRAM, highlighting a 13% increase in the use of commodity DRAM per system [12][13]. Group 3: Innovation Driven by Constraints - Despite limitations in advanced computing power and hardware access, Chinese AI models have rapidly closed the performance gap with global leaders, demonstrating a shift towards algorithmic efficiency and practical system design [17][18]. - This phenomenon is termed "constraint-induced innovation," indicating that future AI advancements may stem from innovative thinking under resource constraints rather than merely increasing hardware capabilities [17][18]. Group 4: Future Outlook - Predictions for DeepSeek's next-generation model V4 suggest significant advancements in coding and reasoning capabilities, with the potential to run on consumer-grade hardware, thereby lowering the marginal costs of high-level AI inference [20][21]. - The report emphasizes optimism regarding the localization of memory and semiconductor equipment in China, as the decoupling of memory from computation is expected to lead to smarter and more efficient LLMs [21].
传DeepSeek曝新模型,梁文锋再放“王炸”?
Xin Lang Cai Jing· 2026-01-21 07:55
Core Insights - DeepSeek has generated significant buzz in the AI community with the unexpected exposure of a new model named Model1 during a code update, suggesting a potential new technological path distinct from the existing V3 series [1][6][8] - Speculation is rife that DeepSeek is preparing to launch its next-generation AI model, V4, around mid-February, following a year of iterative improvements to the V3 model [3][8] Model Development Timeline - On March 25, 2025, DeepSeek released V3-0324, enhancing code generation usability and surpassing GPT-4.5 in mathematical and coding capabilities [4] - On May 29, 2025, the R1 model underwent a minor upgrade, improving performance in mathematics, programming, and general logic, with hallucination rates reduced by 45-50% [4] - On August 21, 2025, DeepSeek V3.1 was launched, offering faster response times and stronger agent capabilities, along with support for Anthropic's API [4] - On September 22, 2025, the V3.1-Terminus version was released, addressing issues with mixed-language inputs and enhancing the performance of Code and Search Agents [4] - On September 29, 2025, the V3.2-Exp version introduced a new attention mechanism, with updated API pricing structures [4] - On December 1, 2025, the official V3.2 version was released, achieving inference capabilities comparable to GPT-5 and integrating thinking modes for tool usage [4][9] Research Contributions - Two papers authored by Liang Wenfeng were published between late December 2025 and early January 2026, addressing training stability and knowledge retrieval efficiency in large model architectures [5][10] - The first paper proposed a manifold-constrained hyper-connections framework to enhance training stability by constraining residual connections within a specific manifold [10][11] - The second paper introduced a conditional memory module that improves inference and knowledge task performance by decoupling knowledge storage from neural computation [10][11] Market Expectations - The AI community is eagerly anticipating whether DeepSeek will unveil the new Model1 or V4 during the upcoming Spring Festival, with expectations of a significant impact on the global AI landscape [6][8]
DeepSeek:基于可扩展查找的条件记忆大型语言模型稀疏性的新维度技术,2026报告
欧米伽未来研究所2025· 2026-01-15 00:29
Core Insights - The article discusses a new architecture called "Engram" proposed by a research team from Peking University and DeepSeek-AI, which aims to enhance the capabilities of large language models (LLMs) by introducing a complementary dimension of "conditional memory" alongside existing "mixture of experts" (MoE) models [2][3]. Group 1: Model Architecture and Performance - The core argument of the report is that language modeling involves two distinct sub-tasks: combinatorial reasoning and knowledge retrieval, with the latter often being static and local [3]. - The Engram architecture modernizes the N-gram concept into a "conditional memory" mechanism, allowing for direct retrieval of static embeddings with O(1) time complexity, thus freeing up computational resources for higher-order reasoning tasks [3][4]. - A significant finding is the "sparsity distribution law," which indicates that a balanced allocation of approximately 20% to 25% of sparse parameter budgets to the Engram module can significantly reduce validation loss while maintaining computational costs [4]. Group 2: Efficiency and Scalability - The Engram model (Engram-27B) outperformed a baseline MoE model (MoE-27B) in various knowledge-intensive and logic-intensive tasks, demonstrating its effectiveness in enhancing model intelligence [4][5]. - Engram's deterministic retrieval mechanism allows for the unloading of large models into host memory, significantly reducing the dependency on GPU memory and enabling the deployment of ultra-large models with limited hardware resources [6][7]. - The architecture's ability to utilize a multi-level cache structure based on the Zipfian distribution of natural language knowledge can greatly benefit cloud service providers and enterprises aiming to reduce deployment costs [7]. Group 3: Long Context Processing - Engram shows structural advantages in handling long contexts by directly addressing many local dependencies, thus allowing the Transformer model to focus on capturing global long-range dependencies [8]. - In long-text benchmark tests, Engram-27B demonstrated a significant accuracy improvement from 84.2% to 97.0% in multi-query retrieval tasks, indicating enhanced efficiency and optimized attention allocation [8]. Group 4: Future Implications - The research signifies a shift in the design philosophy of large models from merely increasing computational depth to a dual-sparsity approach that incorporates both computation and memory [9]. - The introduction of conditional memory is expected to become a standard configuration for the next generation of sparse models, providing high performance and low-cost solutions for trillion-parameter models [9].
速递 | DeepSeek又发论文了,这可能是V4核心预告,普通人的3个机会来了?
未可知人工智能研究院· 2026-01-14 03:02
Core Insights - DeepSeek has introduced a new module called Engram, which addresses a significant limitation of the Transformer architecture by enabling direct memory retrieval, thus improving efficiency in knowledge retrieval and reasoning tasks [9][10][12]. Group 1: Core Problem - The Transformer architecture mixes tasks that should be retrieved with those that require computation, leading to inefficiencies [14][20]. - DeepSeek's Engram module acts as a "quick reference manual," allowing AI to retrieve fixed knowledge instantly rather than computing it through multiple neural network layers [21][22]. Group 2: Key Discoveries - A critical finding from DeepSeek's research is that a balance between memory and computation enhances performance, as demonstrated by a U-shaped curve in their experiments [30][32]. - The introduction of the Engram module not only improves knowledge retrieval but also enhances reasoning capabilities by freeing up neural network resources for complex tasks [36]. Group 3: Industry Impacts - The AI industry is entering a "dual-axis era" with the introduction of conditional memory, which may require companies that invested heavily in MoE architectures to redesign their systems [38][39]. - The hardware ecosystem will change as Engram's deterministic retrieval allows for pre-fetching and overlapping computations, potentially reducing costs for startups while impacting GPU manufacturers negatively [40][44]. - Engram significantly improves long-context capabilities, enhancing performance in tasks involving lengthy documents, which is crucial for industries like legal and medical [46][48]. Group 4: Opportunities for Individuals - There is a surge in demand for knowledge-intensive applications, particularly in fields like healthcare and law, where Engram's efficient retrieval can drastically reduce costs and improve response times [51][52]. - Opportunities exist in providing multilingual and specialized services, leveraging Engram's ability to compress semantic tokens and reduce barriers for small language applications [54][55]. - The long-context application market is expanding, with significant potential in contract review, medical diagnosis, and legal consulting, where Engram's capabilities can address previous limitations [56][59].
梁文锋署名DeepSeek最新论文,提出新方法突破GPU内存限制
Xin Lang Cai Jing· 2026-01-13 12:33
Core Viewpoint - DeepSeek, a Chinese AI startup, has developed a new model training technique that bypasses GPU memory limitations, enhancing cost efficiency and performance in AI model training [1][3]. Group 1: Technology and Innovation - DeepSeek and researchers from Peking University introduced a "conditional memory" technique called "Engram" to address the limitations of high bandwidth memory (HBM) in scaling AI models [3][4]. - The Engram technology allows for more efficient retrieval of foundational information by decoupling computation from storage, improving the model's performance in handling long contexts [4][6]. - In a model with 27 billion parameters, the new technique improved performance on key industry benchmarks by several percentage points, preserving capacity for complex reasoning tasks [4][6]. Group 2: Competitive Landscape - The HBM gap between China and the US is significant, with Chinese storage chip manufacturers lagging behind their US and South Korean counterparts [4]. - DeepSeek's previous model, DeepSeek-R1, was trained in two months at a cost of $5.5 million, significantly lower than the expenses incurred by US companies like OpenAI, while achieving comparable performance [6][7]. - Microsoft President Brad Smith highlighted that Chinese companies like DeepSeek are rapidly gaining ground in the global AI market, particularly in emerging markets, due to their low-cost open-source models [7]. Group 3: Future Developments - Anticipation is building for DeepSeek's upcoming V4 model, expected to launch in mid-February, which is said to possess strong programming capabilities [7].
DeepSeek V4诞生前夜?梁文锋署名新论文发布
华尔街见闻· 2026-01-13 11:01
Core Viewpoint - The article discusses a groundbreaking paper by DeepSeek and Peking University that introduces a new module called Engram, which separates memory from computation in AI models, leading to a significant increase in reasoning capabilities [3][12]. Group 1: Introduction of Engram Module - DeepSeek's Engram module represents a supply-side reform in AI model architecture, allowing static knowledge to be stored separately from computational tasks, thus enhancing AI's reasoning abilities [3][14]. - The Engram module is inspired by the classic N-gram concept from natural language processing, modernized to allow for efficient retrieval of static knowledge with a time complexity of O(1) [15][16]. Group 2: Technical Innovations - Engram utilizes a large, scalable embedding table to store static knowledge, allowing for direct retrieval without complex computations, contrasting with traditional Transformer models where knowledge is embedded in weights [18]. - Three technical barriers were addressed: - A. Vocabulary compression reduced the effective vocabulary size by 23% through normalization of semantically similar terms [19]. - B. Multi-head hashing resolves hash collisions by mapping multiple N-grams to limited memory slots, enhancing robustness [20]. - C. Context-aware gating acts as a referee to filter out irrelevant static knowledge based on the current context [21][22]. Group 3: Resource Allocation and Model Performance - A large-scale ablation study revealed a U-shaped scaling law for resource allocation, indicating that the optimal distribution of parameters is approximately 75%-80% for Engram and 20%-25% for MoE, minimizing loss [30][31]. - The introduction of Engram not only improved knowledge tasks but also unexpectedly enhanced performance in logic, coding, and mathematics, with significant score increases across various benchmarks [39][40]. Group 4: Engineering Breakthroughs - Engram's architecture allows for a separation of memory and computation, enabling large models to offload memory to cheaper, scalable CPU resources, thus reducing reliance on expensive GPU memory [46][49]. - This separation allows for prefetching of memory data, maintaining high throughput even with large parameter sizes, which is a significant advantage for future AI model development [51][52]. Group 5: Future Implications - The upcoming DeepSeek V4 model is expected to integrate Engram technology, achieving a balance between computation and memory, enhancing both knowledge capacity and reasoning capabilities while reducing inference costs [61][64]. - The paper signals a shift in the AI industry towards architectural innovation, moving away from merely increasing computational power and parameters, and redefining competitive standards in AI development [65].
DeepSeek开源大模型记忆模块,梁文锋署名新论文,下一代稀疏模型提前剧透
3 6 Ke· 2026-01-13 07:14
Core Insights - DeepSeek has introduced a new paradigm called "Conditional Memory" to enhance the Transformer model's knowledge retrieval capabilities, which were previously lacking [1][4][31] - The Engram module allows for significant improvements in model efficiency, enabling simpler tasks to be completed with fewer layers, thus freeing up resources for more complex reasoning tasks [4][21] Group 1: Conditional Memory and Engram Module - The paper presents Conditional Memory as an essential modeling primitive for the next generation of sparse models [1][4] - Engram enables the model to perform tasks that previously required six layers of attention in just one or two layers, optimizing resource allocation [4][21] - The Engram design incorporates a large vocabulary for static knowledge retrieval, allowing for O(1) speed in information retrieval [4][6] Group 2: Performance and Efficiency - The optimal allocation of parameters between MoE (Mixture of Experts) and Engram memory was found to be around 20% to 25%, leading to a reduction in model validation loss [17][21] - In experiments, the Engram-27B model outperformed the MoE-27B model in various knowledge-intensive tasks, with notable improvements in general reasoning and code mathematics [21][22] - The Engram-40B model further increased memory parameters, showing sustained performance improvements and indicating that memory capacity had not yet saturated [25][31] Group 3: Hardware Optimization - The Engram module allows for the offloading of large parameter tables to CPU memory, minimizing inference delays and maintaining high throughput [29][30] - The design principle of "hardware-aware efficiency" enables the decoupling of storage and computation, facilitating the use of massive parameter tables without significant performance costs [31]
DeepSeek发布梁文锋署名新论文
券商中国· 2026-01-13 06:25
Group 1 - The article discusses a new paper released by DeepSeek on December 12, titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models," co-authored with Peking University [1] - The paper introduces the concept of conditional memory, which significantly enhances model performance in knowledge retrieval, reasoning, coding, and mathematical tasks under equal parameters and computational conditions [1] - DeepSeek has open-sourced a related memory module called Engram, which is part of the advancements discussed in the paper [1]