Workflow
DeepSeek
icon
Search documents
DeepSeek论文上新!下一代大模型实现“记忆分离”,V4不远了?
Di Yi Cai Jing Zi Xun· 2026-01-13 03:32
Core Insights - DeepSeek has released a new paper focusing on the conditional memory module of large models, suggesting it will be a core modeling primitive in the next generation of sparse large models [1][4]. Group 1: Research Findings - The new paper, co-authored with Peking University, is titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models" and highlights the need for a native knowledge retrieval mechanism in existing Transformer architectures [4]. - The research identifies two distinct tasks in large models: deep dynamic computation for combinatorial reasoning and static knowledge retrieval, indicating that current models inefficiently simulate retrieval processes [4][5]. - DeepSeek introduces conditional memory as a supplementary dimension of sparsity, optimizing the trade-off between mixture of experts (MoE) and static memory (Engram) [4][6]. Group 2: Performance Improvements - The team discovered a U-shaped scaling law, showing that the mixed sparse capacity allocation between MoE experts and Engram memory significantly outperforms pure MoE baseline models [5]. - The introduction of the memory module not only aids knowledge retrieval but also yields notable improvements in general reasoning, coding, and mathematical tasks [5][6]. - The paper essentially proposes a "division of labor" optimization for large models, allowing specialized modules to handle specific tasks, thereby enhancing efficiency and resource allocation [6]. Group 3: Future Developments - Industry speculation suggests that the proposed conditional memory may be integral to the architecture of DeepSeek's upcoming flagship model, DeepSeek V4, expected to be released around February [6]. - Initial tests indicate that V4 may surpass other leading models in programming capabilities, with the previous model, V3, having already outperformed OpenAI's GPT-5 and Google's Gemini 3.0 Pro in various benchmarks [6].
DeepSeek发布梁文锋署名新论文
证券时报· 2026-01-13 03:27
Core Viewpoint - DeepSeek released a new paper titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models," which introduces conditional memory to enhance model performance in various tasks under equal parameters and computational conditions [1]. Group 1 - The paper was co-authored by Peking University and DeepSeek, with Liang Wenfeng listed as a co-author [1]. - Conditional memory is proposed to significantly improve model performance in knowledge retrieval, reasoning, coding, and mathematical tasks [1]. - DeepSeek has open-sourced a related memory module called Engram [1].
DeepSeek发布梁文锋署名新论文
Zheng Quan Shi Bao· 2026-01-13 03:02
Core Insights - DeepSeek released a new paper titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models" on the evening of the 12th [1] - The paper was co-authored by Peking University and DeepSeek, with Liang Wenfeng listed as a co-author [1] - The concept of conditional memory is introduced, which significantly enhances model performance in knowledge retrieval, reasoning, coding, and mathematical tasks under equal parameters and computational conditions [1] - DeepSeek has also open-sourced a related memory module named Engram [1] Company and Industry Summary - The collaboration between DeepSeek and Peking University highlights the growing trend of partnerships between academia and industry in advancing AI technologies [1] - The introduction of scalable lookup structures in large language models represents a significant innovation in the field, potentially leading to improved efficiency and effectiveness in AI applications [1] - The open-sourcing of the Engram memory module may encourage further research and development in conditional memory systems, fostering a more collaborative environment in AI advancements [1]
AI应用持续火爆!用友网络涨停,软件50ETF(159590)深V反弹涨超1.3%,早盘获净申购超1.2亿,昨日大举揽金1.43亿元!机构:2026年AI应用投资元年
Sou Hu Cai Jing· 2026-01-13 02:49
Group 1 - The core viewpoint of the news highlights the ongoing enthusiasm in the A-share software sector, with significant inflows into the Software 50 ETF, which saw a rebound of over 1.3% and attracted 1.43 billion yuan in funds yesterday and over 1.2 billion yuan today [1][4] - The AGI-Next summit discussed the shift in large model competition from "Chat" to "Agent" phase, emphasizing the importance of executing complex tasks in real environments, with 2026 predicted to be the year of commercial value realization [4][6] - Major stocks within the Software 50 ETF showed strong performance, with notable increases such as Zhongke Xingtou rising over 10%, Yonyou Network hitting the daily limit, and Wanxing Technology increasing over 9% [4][5] Group 2 - The AI industry is experiencing a surge in interest, with significant developments in capital, applications, and technology, as evidenced by the strong market performance of leading general large model companies that recently went public [6] - The upcoming release of DeepSeek's flagship model V4 is expected to enhance competition among first-tier general large models, with a focus on AI-assisted programming tools that are anticipated to achieve large-scale commercialization [6][7] - Analysts predict that 2026 will mark the investment year for AI applications due to continuous improvements in model capabilities, decreasing computing costs, and accelerated monetization processes in AI applications [7][8]
王兴、张一鸣、梁文锋有一个共同特征
Sou Hu Cai Jing· 2026-01-13 02:48
Group 1 - DeepSeek has launched a new open-source architecture module called Engram, which is speculated to be the core technology for its next-generation model V4 [2] - The founder Liang Wenfeng maintains a low-profile approach, focusing on product and technology rather than public appearances [2] - Liang Wenfeng is compared to other successful tech entrepreneurs like Wang Xing and Zhang Yiming, who also exhibit a humble demeanor despite their achievements [2][4] Group 2 - Wang Xing, the leader of Meituan, does not have an independent office and prefers to work alongside employees, reflecting a down-to-earth attitude [4] - Zhang Yiming, despite being based in Singapore, remains engaged with AI research and maintains a student-like curiosity towards technology [6] - The article highlights the common trait among these young entrepreneurs of staying grounded and practical in their respective fields, showing resilience against competition [6]
新“易中天”继续强势:易点天下、天龙集团3连板,中文在线涨超15%
Ge Long Hui· 2026-01-13 01:53
Group 1 - The A-share market's AI application sector continues to perform strongly, with companies like Di'an Diagnostics, Yidian Tianxia, and Tianlong Group hitting the 20% daily limit up, marking a three-day consecutive rise [1] - The AGI-Next summit initiated by Tsinghua University's key laboratory highlights a shift in large model competition from "Chat" to "Agent," focusing on executing complex tasks in real environments [1] - The acceleration of AI applications in the healthcare sector is notable, with Ant Group's "Antifufu" transforming into an AI health partner and quickly entering the top 3 of the Apple App Store, indicating strong consumer demand for integrated healthcare services [1] Group 2 - Citic Construction Investment Securities emphasizes that as model capabilities improve and costs for reasoning and long-window tasks decrease, AI downstream application scenarios are rapidly entering the commercialization verification phase, particularly in search & marketing, coding, multimodal, Agent, and AI for Science fields [2] - The stock performance of key companies in the AI sector shows significant year-to-date gains, with Yidian Tianxia up 87.36%, Di'an Diagnostics up 97.29%, and Tianlong Group up 82.26% [3]
梁文锋署名新论文,DeepSeek V4架构首曝?直击Transformer致命缺陷
3 6 Ke· 2026-01-13 01:24
Core Insights - DeepSeek's new paper introduces a novel approach to address the memory limitations of Transformer models by proposing a complementary "conditional memory" sparse axis through the Engram module, which enables efficient knowledge retrieval with near O(1) complexity [1][6][11]. Group 1: Memory and Model Architecture - The paper highlights that while MoE (Mixture of Experts) has become a mainstream architecture for large models, it fundamentally still relies on Transformers, which lack a native knowledge retrieval mechanism, leading to inefficient computation [9][11]. - Engram is designed to offload static, repetitive patterns in language modeling to a scalable lookup module, allowing the Transformer backbone to focus on more complex tasks requiring combination and reasoning [11][15]. - The authors categorize language modeling tasks into two types: those requiring combination and reasoning, and those resembling pattern retrieval, emphasizing the need for a dedicated mechanism for the latter [12][13]. Group 2: Engram Architecture and Functionality - Engram is conceptualized as a modernized version of classic hash N-gram, functioning as a scalable lookup module integrated within the Transformer architecture [18][20]. - The architecture includes a two-stage process for handling input sequences, focusing on retrieval and fusion, which enhances the model's efficiency in processing static patterns [20][21]. - The introduction of a context-aware gating mechanism allows the model to dynamically adjust its responses based on the retrieved embeddings, improving the overall expressiveness and reducing noise from hash collisions [25][27]. Group 3: Performance and Scaling - The paper presents a U-shaped scaling law indicating that an optimal resource allocation between MoE and Engram can enhance model performance, suggesting that a balance between dynamic computation and static memory is crucial [3][33]. - Experimental results show that Engram, when scaled to 27 billion parameters, outperforms the MoE baseline under equivalent parameter and FLOPs conditions, demonstrating its effectiveness in various benchmarks [5][38]. - Engram's architecture not only improves knowledge retrieval but also enhances reasoning, mathematics, and coding capabilities, indicating a significant leap in performance metrics across multiple tasks [39][48]. Group 4: Future Implications - The findings suggest a paradigm shift in model architecture towards a dual-axis approach of computation and memory, with potential integration into future iterations of large language models, such as V4 [46][50]. - The paper posits that the integration of Engram could lead to substantial improvements in model efficiency and capability, paving the way for more advanced applications in natural language processing [51][52].
港股异动 | MiniMax一度大涨11%!AI概念集体拉升,快手创近3个月新高
Xin Lang Cai Jing· 2026-01-13 01:16
Group 1 - The core viewpoint of the news highlights a significant surge in AI-related stocks in the Hong Kong market, with notable increases in companies such as Kuaishou, Weimob, and MiniMax, indicating strong investor interest in the AI sector [1] - Kuaishou's stock price increased by 4%, reaching a three-month high, while Weimob's stock rose over 13%, and MiniMax saw an 11% increase, with its cumulative listing gain exceeding 130% [1] - The upcoming release of DeepSeek's new flagship model V4, which reportedly outperforms major models like Claude and ChatGPT in code generation, is expected to drive further excitement in the AI market [1] Group 2 - Citic Securities believes that the current dynamics in the AI industry, including financing activities by overseas companies like xAI and Anthropic, along with domestic policies promoting "AI + manufacturing," will lead to a new wave of AI application enthusiasm [2] - The continuous improvement in model capabilities, particularly in reasoning and reduced costs for long-window applications, is accelerating the commercialization of AI in various sectors, including search and marketing, coding, multimodal applications, agents, and AI for science [2] - Companies involved in these areas are expected to see an acceleration in their commercialization processes as the AI landscape evolves [2]
DeepSeek发布梁文锋署名新论文
财联社· 2026-01-13 01:15
Core Insights - DeepSeek released a new paper titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models" on the evening of the 12th, co-authored with Peking University, featuring Liang Wenfeng [1] - The paper introduces conditional memory, which significantly enhances model performance in knowledge retrieval, reasoning, coding, and mathematical tasks under equal parameters and computational conditions [1] - DeepSeek has open-sourced the related memory module called Engram [1]
刚刚,梁文锋署名开源“记忆”模块,DeepSeek V4更细节了
程序员的那些事· 2026-01-13 00:56
Core Insights - DeepSeek has introduced a new research paper titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models," in collaboration with Peking University, focusing on enhancing large language models (LLMs) through conditional memory and a new module called Engram [1][3][4]. Group 1: Research Background and Problem Statement - Current large language models primarily utilize Mixture of Experts (MoE) for sparsity, but existing Transformer architectures lack native knowledge retrieval mechanisms, leading to inefficient simulation of retrieval behavior [3][9]. - DeepSeek proposes conditional memory as a complementary approach to MoE, introducing the Engram module to address the limitations of current models [4][9]. Group 2: Engram Module and Its Functionality - The Engram module modernizes classic n-gram embeddings, enabling knowledge retrieval with O(1) time complexity [9]. - Engram separates static knowledge storage from dynamic computation processes, enhancing the model's ability to perform complex reasoning by offloading the reconstruction burden from the model's shallow layers [11][13]. Group 3: Performance Improvements - Engram has been scaled to 27 billion parameters, showing significant performance improvements over pure MoE baseline models under equivalent parameter and FLOPs conditions [11]. - Notably, Engram enhances knowledge retrieval capabilities, with improvements in metrics such as MMLU (+3.4), CMMLU (+4.0), and general reasoning tasks like BBH (+5.0) and ARC-Challenge (+3.7) [11][38]. Group 4: System Efficiency and Scalability - Engram's deterministic addressing supports prefetching from host memory at runtime with minimal performance overhead, allowing for efficient memory management [12][19]. - The architecture allows for the decoupling of parameter storage from computational resources, facilitating linear scalability with the number of accelerators [21][22]. Group 5: Experimental Results - Four models were trained: Dense-4B, MoE-27B, Engram-27B, and Engram-40B, all using the same training data and processes [35][36]. - Sparse architectures (MoE-27B, Engram-27B/40B) significantly outperformed the dense model (Dense-4B) across various benchmarks, demonstrating superior scaling properties [38]. Group 6: Long Context Training - Engram architecture has shown significant advantages in long-context tasks by preserving valuable attention capacity for global context processing [41]. - Controlled experiments indicate that Engram outperforms MoE models in complex retrieval tasks, confirming its architectural superiority [46].