Workflow
DeepSeek
icon
Search documents
梁文锋署名DeepSeek新论文发布,直指大模型“记忆”短板
Bei Ke Cai Jing· 2026-01-13 04:41
Core Insights - The paper published by DeepSeek addresses the memory limitations of current large language models and introduces the concept of "conditional memory" [2] - DeepSeek proposes a module named Engram, which breaks down language modeling tasks into two branches: "static pattern retrieval" for quick access to deterministic knowledge and "dynamic combinatorial reasoning" for complex logical operations [2] - The paper suggests that conditional memory is an essential modeling primitive for the next generation of sparse models, with speculation that DeepSeek's next model may be released before the Spring Festival [3] Group 1 - The paper titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models" was co-authored by Peking University and DeepSeek [1] - The introduction of "conditional memory" aims to enhance the memory capabilities of large language models [2] - The Engram module is designed to improve efficiency in language modeling by separating tasks into static and dynamic components [2] Group 2 - The paper emphasizes the importance of conditional memory for future sparse model development [3] - There are speculations regarding the release of DeepSeek's next-generation model around the Spring Festival, potentially replicating the success of previous launches [3]
DeepSeek V4路线图隐现?梁文锋署名重磅论文发布,聚焦大模型条件记忆模块
Jin Rong Jie· 2026-01-13 04:38
Core Insights - DeepSeek has released a significant research paper focusing on the conditional memory module for large models, indicating it will be a core modeling primitive in the next generation of sparse large models [1][4] - The upcoming flagship model V4 is expected to be unveiled around the Spring Festival, with the recent research results potentially outlining its core research roadmap [1][4] Summary by Sections Research Findings - The paper titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models" was co-authored by DeepSeek and Peking University, with DeepSeek's founder Liang Wenfeng among the authors [4] - The core insight of the paper is that large models handle two distinct types of tasks: deep dynamic computation for combinatorial reasoning and static knowledge retrieval [4] - Existing Transformer architectures lack a native knowledge retrieval mechanism, leading to inefficient computation when simulating retrieval processes [4] Proposed Solutions - To address these inefficiencies, DeepSeek proposes the use of conditional memory as a supplementary dimension of sparsity, implemented through a module called Engram [5] - The team discovered a "U-shaped scaling law," indicating that a mixed sparse capacity allocation between MoE experts and Engram memory significantly outperforms pure MoE baseline models [5] - The Engram module is designed to optimize the balance between neural computation (MoE) and static memory, allowing for improved efficiency and performance in various domains, including general reasoning, coding, and mathematics [5] Future Developments - DeepSeek plans to release the next-generation flagship model V4 in February, with preliminary internal tests showing its programming capabilities surpass existing top models [6] - The V4 model is anticipated to be a focal point in the industry, especially following the success of the V3 model released at the end of 2024, which outperformed OpenAI's GPT-5 and Google's Gemini 3.0 Pro in several benchmark tests [6]
DeepSeek发布梁文锋署名新论文
新华网财经· 2026-01-13 03:52
Core Insights - DeepSeek released a new paper titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models" on the evening of the 12th, co-authored with Peking University, featuring Liang Wenfeng [1] - The paper introduces conditional memory, which significantly enhances model performance in knowledge retrieval, reasoning, coding, and mathematical tasks under equal parameters and computational conditions [1] - DeepSeek has also open-sourced a related memory module called Engram [1]
梁文锋署名,DeepSeek论文上新
Di Yi Cai Jing Zi Xun· 2026-01-13 03:41
Core Insights - DeepSeek has released a new paper focusing on the conditional memory module of large models, suggesting it will be a core modeling primitive in the next generation of sparse large models [2][5][7] Group 1: Research and Development - The new paper, co-authored with Peking University, is titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models" [5] - The research identifies two distinct tasks within large models: deep dynamic computation for combinatorial reasoning and static knowledge retrieval, highlighting inefficiencies in the current Transformer architecture [5][6] - DeepSeek introduces conditional memory as a supplementary sparse dimension to optimize the balance between neural computation (MoE) and static memory (Engram) [6][7] Group 2: Performance and Implications - The team discovered a U-shaped scaling law indicating that the mixed sparse capacity allocation between MoE experts and Engram memory significantly outperforms pure MoE baseline models [6] - The introduction of the memory module not only aids knowledge retrieval but also shows significant improvements in general reasoning, coding, and mathematical tasks [6][7] - The paper essentially proposes a "division of labor" optimization for large models, allowing specialized modules to handle specific tasks more efficiently [6][7] Group 3: Future Developments - Industry speculation suggests that the proposed conditional memory may be part of the technical architecture for DeepSeek's upcoming flagship model, DeepSeek V4, expected to be released around February [7] - Initial tests indicate that V4 may surpass other leading models in programming capabilities, with the previous V3 model having already outperformed OpenAI's GPT-5 and Google's Gemini 3.0 Pro in various benchmarks [7]
DeepSeek论文上新!下一代大模型实现“记忆分离”,V4不远了?
Di Yi Cai Jing Zi Xun· 2026-01-13 03:32
Core Insights - DeepSeek has released a new paper focusing on the conditional memory module of large models, suggesting it will be a core modeling primitive in the next generation of sparse large models [1][4]. Group 1: Research Findings - The new paper, co-authored with Peking University, is titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models" and highlights the need for a native knowledge retrieval mechanism in existing Transformer architectures [4]. - The research identifies two distinct tasks in large models: deep dynamic computation for combinatorial reasoning and static knowledge retrieval, indicating that current models inefficiently simulate retrieval processes [4][5]. - DeepSeek introduces conditional memory as a supplementary dimension of sparsity, optimizing the trade-off between mixture of experts (MoE) and static memory (Engram) [4][6]. Group 2: Performance Improvements - The team discovered a U-shaped scaling law, showing that the mixed sparse capacity allocation between MoE experts and Engram memory significantly outperforms pure MoE baseline models [5]. - The introduction of the memory module not only aids knowledge retrieval but also yields notable improvements in general reasoning, coding, and mathematical tasks [5][6]. - The paper essentially proposes a "division of labor" optimization for large models, allowing specialized modules to handle specific tasks, thereby enhancing efficiency and resource allocation [6]. Group 3: Future Developments - Industry speculation suggests that the proposed conditional memory may be integral to the architecture of DeepSeek's upcoming flagship model, DeepSeek V4, expected to be released around February [6]. - Initial tests indicate that V4 may surpass other leading models in programming capabilities, with the previous model, V3, having already outperformed OpenAI's GPT-5 and Google's Gemini 3.0 Pro in various benchmarks [6].
DeepSeek发布梁文锋署名新论文
证券时报· 2026-01-13 03:27
Core Viewpoint - DeepSeek released a new paper titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models," which introduces conditional memory to enhance model performance in various tasks under equal parameters and computational conditions [1]. Group 1 - The paper was co-authored by Peking University and DeepSeek, with Liang Wenfeng listed as a co-author [1]. - Conditional memory is proposed to significantly improve model performance in knowledge retrieval, reasoning, coding, and mathematical tasks [1]. - DeepSeek has open-sourced a related memory module called Engram [1].
DeepSeek发布梁文锋署名新论文
Zheng Quan Shi Bao· 2026-01-13 03:02
Core Insights - DeepSeek released a new paper titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models" on the evening of the 12th [1] - The paper was co-authored by Peking University and DeepSeek, with Liang Wenfeng listed as a co-author [1] - The concept of conditional memory is introduced, which significantly enhances model performance in knowledge retrieval, reasoning, coding, and mathematical tasks under equal parameters and computational conditions [1] - DeepSeek has also open-sourced a related memory module named Engram [1] Company and Industry Summary - The collaboration between DeepSeek and Peking University highlights the growing trend of partnerships between academia and industry in advancing AI technologies [1] - The introduction of scalable lookup structures in large language models represents a significant innovation in the field, potentially leading to improved efficiency and effectiveness in AI applications [1] - The open-sourcing of the Engram memory module may encourage further research and development in conditional memory systems, fostering a more collaborative environment in AI advancements [1]
AI应用持续火爆!用友网络涨停,软件50ETF(159590)深V反弹涨超1.3%,早盘获净申购超1.2亿,昨日大举揽金1.43亿元!机构:2026年AI应用投资元年
Sou Hu Cai Jing· 2026-01-13 02:49
Group 1 - The core viewpoint of the news highlights the ongoing enthusiasm in the A-share software sector, with significant inflows into the Software 50 ETF, which saw a rebound of over 1.3% and attracted 1.43 billion yuan in funds yesterday and over 1.2 billion yuan today [1][4] - The AGI-Next summit discussed the shift in large model competition from "Chat" to "Agent" phase, emphasizing the importance of executing complex tasks in real environments, with 2026 predicted to be the year of commercial value realization [4][6] - Major stocks within the Software 50 ETF showed strong performance, with notable increases such as Zhongke Xingtou rising over 10%, Yonyou Network hitting the daily limit, and Wanxing Technology increasing over 9% [4][5] Group 2 - The AI industry is experiencing a surge in interest, with significant developments in capital, applications, and technology, as evidenced by the strong market performance of leading general large model companies that recently went public [6] - The upcoming release of DeepSeek's flagship model V4 is expected to enhance competition among first-tier general large models, with a focus on AI-assisted programming tools that are anticipated to achieve large-scale commercialization [6][7] - Analysts predict that 2026 will mark the investment year for AI applications due to continuous improvements in model capabilities, decreasing computing costs, and accelerated monetization processes in AI applications [7][8]
王兴、张一鸣、梁文锋有一个共同特征
Sou Hu Cai Jing· 2026-01-13 02:48
Group 1 - DeepSeek has launched a new open-source architecture module called Engram, which is speculated to be the core technology for its next-generation model V4 [2] - The founder Liang Wenfeng maintains a low-profile approach, focusing on product and technology rather than public appearances [2] - Liang Wenfeng is compared to other successful tech entrepreneurs like Wang Xing and Zhang Yiming, who also exhibit a humble demeanor despite their achievements [2][4] Group 2 - Wang Xing, the leader of Meituan, does not have an independent office and prefers to work alongside employees, reflecting a down-to-earth attitude [4] - Zhang Yiming, despite being based in Singapore, remains engaged with AI research and maintains a student-like curiosity towards technology [6] - The article highlights the common trait among these young entrepreneurs of staying grounded and practical in their respective fields, showing resilience against competition [6]
新“易中天”继续强势:易点天下、天龙集团3连板,中文在线涨超15%
Ge Long Hui· 2026-01-13 01:53
Group 1 - The A-share market's AI application sector continues to perform strongly, with companies like Di'an Diagnostics, Yidian Tianxia, and Tianlong Group hitting the 20% daily limit up, marking a three-day consecutive rise [1] - The AGI-Next summit initiated by Tsinghua University's key laboratory highlights a shift in large model competition from "Chat" to "Agent," focusing on executing complex tasks in real environments [1] - The acceleration of AI applications in the healthcare sector is notable, with Ant Group's "Antifufu" transforming into an AI health partner and quickly entering the top 3 of the Apple App Store, indicating strong consumer demand for integrated healthcare services [1] Group 2 - Citic Construction Investment Securities emphasizes that as model capabilities improve and costs for reasoning and long-window tasks decrease, AI downstream application scenarios are rapidly entering the commercialization verification phase, particularly in search & marketing, coding, multimodal, Agent, and AI for Science fields [2] - The stock performance of key companies in the AI sector shows significant year-to-date gains, with Yidian Tianxia up 87.36%, Di'an Diagnostics up 97.29%, and Tianlong Group up 82.26% [3]