Seek .(SKLTY)
Search documents
DeepSeek V4路线图隐现?梁文锋署名重磅论文发布,聚焦大模型条件记忆模块
Jin Rong Jie· 2026-01-13 04:38
Core Insights - DeepSeek has released a significant research paper focusing on the conditional memory module for large models, indicating it will be a core modeling primitive in the next generation of sparse large models [1][4] - The upcoming flagship model V4 is expected to be unveiled around the Spring Festival, with the recent research results potentially outlining its core research roadmap [1][4] Summary by Sections Research Findings - The paper titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models" was co-authored by DeepSeek and Peking University, with DeepSeek's founder Liang Wenfeng among the authors [4] - The core insight of the paper is that large models handle two distinct types of tasks: deep dynamic computation for combinatorial reasoning and static knowledge retrieval [4] - Existing Transformer architectures lack a native knowledge retrieval mechanism, leading to inefficient computation when simulating retrieval processes [4] Proposed Solutions - To address these inefficiencies, DeepSeek proposes the use of conditional memory as a supplementary dimension of sparsity, implemented through a module called Engram [5] - The team discovered a "U-shaped scaling law," indicating that a mixed sparse capacity allocation between MoE experts and Engram memory significantly outperforms pure MoE baseline models [5] - The Engram module is designed to optimize the balance between neural computation (MoE) and static memory, allowing for improved efficiency and performance in various domains, including general reasoning, coding, and mathematics [5] Future Developments - DeepSeek plans to release the next-generation flagship model V4 in February, with preliminary internal tests showing its programming capabilities surpass existing top models [6] - The V4 model is anticipated to be a focal point in the industry, especially following the success of the V3 model released at the end of 2024, which outperformed OpenAI's GPT-5 and Google's Gemini 3.0 Pro in several benchmark tests [6]
梁文锋署名,DeepSeek论文上新
Di Yi Cai Jing Zi Xun· 2026-01-13 03:41
Core Insights - DeepSeek has released a new paper focusing on the conditional memory module of large models, suggesting it will be a core modeling primitive in the next generation of sparse large models [2][5][7] Group 1: Research and Development - The new paper, co-authored with Peking University, is titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models" [5] - The research identifies two distinct tasks within large models: deep dynamic computation for combinatorial reasoning and static knowledge retrieval, highlighting inefficiencies in the current Transformer architecture [5][6] - DeepSeek introduces conditional memory as a supplementary sparse dimension to optimize the balance between neural computation (MoE) and static memory (Engram) [6][7] Group 2: Performance and Implications - The team discovered a U-shaped scaling law indicating that the mixed sparse capacity allocation between MoE experts and Engram memory significantly outperforms pure MoE baseline models [6] - The introduction of the memory module not only aids knowledge retrieval but also shows significant improvements in general reasoning, coding, and mathematical tasks [6][7] - The paper essentially proposes a "division of labor" optimization for large models, allowing specialized modules to handle specific tasks more efficiently [6][7] Group 3: Future Developments - Industry speculation suggests that the proposed conditional memory may be part of the technical architecture for DeepSeek's upcoming flagship model, DeepSeek V4, expected to be released around February [7] - Initial tests indicate that V4 may surpass other leading models in programming capabilities, with the previous V3 model having already outperformed OpenAI's GPT-5 and Google's Gemini 3.0 Pro in various benchmarks [7]
DeepSeek论文上新!下一代大模型实现“记忆分离”,V4不远了?
Di Yi Cai Jing Zi Xun· 2026-01-13 03:32
Core Insights - DeepSeek has released a new paper focusing on the conditional memory module of large models, suggesting it will be a core modeling primitive in the next generation of sparse large models [1][4]. Group 1: Research Findings - The new paper, co-authored with Peking University, is titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models" and highlights the need for a native knowledge retrieval mechanism in existing Transformer architectures [4]. - The research identifies two distinct tasks in large models: deep dynamic computation for combinatorial reasoning and static knowledge retrieval, indicating that current models inefficiently simulate retrieval processes [4][5]. - DeepSeek introduces conditional memory as a supplementary dimension of sparsity, optimizing the trade-off between mixture of experts (MoE) and static memory (Engram) [4][6]. Group 2: Performance Improvements - The team discovered a U-shaped scaling law, showing that the mixed sparse capacity allocation between MoE experts and Engram memory significantly outperforms pure MoE baseline models [5]. - The introduction of the memory module not only aids knowledge retrieval but also yields notable improvements in general reasoning, coding, and mathematical tasks [5][6]. - The paper essentially proposes a "division of labor" optimization for large models, allowing specialized modules to handle specific tasks, thereby enhancing efficiency and resource allocation [6]. Group 3: Future Developments - Industry speculation suggests that the proposed conditional memory may be integral to the architecture of DeepSeek's upcoming flagship model, DeepSeek V4, expected to be released around February [6]. - Initial tests indicate that V4 may surpass other leading models in programming capabilities, with the previous model, V3, having already outperformed OpenAI's GPT-5 and Google's Gemini 3.0 Pro in various benchmarks [6].
DeepSeek-V4 即将发布,算力效率与性能双升级!低费率云计算ETF华夏、创业板人工智能ETF华夏获资金抢筹
Xin Lang Cai Jing· 2026-01-13 03:32
Group 1 - The three major indices turned negative again, with the technology sector adjusting alongside the market. The communication ETF Huaxia (515050) saw its decline expand to 2.39%, with mixed performance among its holdings [1] - The low-fee创业板人工智能 ETF Huaxia (159381) dropped by 1.64%, with trading volume quickly surpassing 300 million yuan, indicating active capital trading [1] - The low-fee cloud computing ETF Huaxia (516630) fell by 0.64%, but has seen a continuous net inflow of over 130 million yuan in the past three trading days, suggesting accelerated capital allocation [1] Group 2 - Ping An Securities noted that global AI computing platform capabilities are continuously improving, with major chip manufacturers like NVIDIA and AMD showcasing advancements in AI computing chips at CES 2026 [2] - NVIDIA announced the full production of the NVIDIARubin platform, with products based on Rubin expected to be available through partners in the second half of 2026 [2] - AMD unveiled the "Helios" platform and provided a preview of the next-generation MI500 series GPU, indicating a significant enhancement in global AI computing infrastructure [2] Group 3 - The cloud computing ETF Huaxia (516630) tracks the cloud computing index (930851) and has the lowest fee rate, focusing on domestic AI software and hardware computing, with a combined weight of 83.7% in computer software, cloud services, and computer equipment [3] - The创业板人工智能 ETF Huaxia (159381) supports investment in AI-focused companies on the创业板, with half of its weight in AI hardware computing and the other half in AI software applications, showcasing high elasticity and representativeness [3] - The communication ETF Huaxia (515050) tracks the CSI 5G communication theme index, deeply focusing on the supply chains of NVIDIA, Apple, and Huawei, with its top five holdings including Zhongji Xuchuang, Xinyi Sheng, Lixun Precision, Industrial Fulian, and Zhaoyi Innovation [3]
DeepSeek发布梁文锋署名新论文
Zheng Quan Shi Bao· 2026-01-13 03:02
Core Insights - DeepSeek released a new paper titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models" on the evening of the 12th [1] - The paper was co-authored by Peking University and DeepSeek, with Liang Wenfeng listed as a co-author [1] - The concept of conditional memory is introduced, which significantly enhances model performance in knowledge retrieval, reasoning, coding, and mathematical tasks under equal parameters and computational conditions [1] - DeepSeek has also open-sourced a related memory module named Engram [1] Company and Industry Summary - The collaboration between DeepSeek and Peking University highlights the growing trend of partnerships between academia and industry in advancing AI technologies [1] - The introduction of scalable lookup structures in large language models represents a significant innovation in the field, potentially leading to improved efficiency and effectiveness in AI applications [1] - The open-sourcing of the Engram memory module may encourage further research and development in conditional memory systems, fostering a more collaborative environment in AI advancements [1]
DeepSeek等8大产品都是意外?! 改变世界的项目们,最初都没被“当个事儿办”
Sou Hu Cai Jing· 2026-01-13 01:47
Core Insights - Many groundbreaking products initially started as side projects, which were not considered significant at their inception [1][2][3][5][6] - Side projects are defined as non-core, non-KPI driven initiatives that are not part of a company's strategic plan [1] - The success of side projects can be attributed to their ability to operate without the constraints typically associated with mainline projects, allowing for greater innovation and flexibility [2][3][6] Group 1: Examples of Successful Side Projects - DeepSeek, a side project of Huansquare Quantitative, emerged from internal technical research and has become a significant tool in quantitative trading [2] - Qwen, developed by Alibaba, was initially a side project that allowed for more autonomy and faster iteration, ultimately leading to its integration into the company's main offerings [3] - Claude Code, initially a simple experimental project by an engineer, evolved into a key product for Anthropic, demonstrating the potential of side projects to gain traction unexpectedly [5] Group 2: Impact of AI on Project Development - The integration of AI into software engineering has lowered the cost of experimentation, enabling individuals to validate ideas more quickly and easily [7][8] - Side projects often begin by addressing specific problems and evolve through real-world usage, which enhances their maturity and relevance [8] - The shift towards AI-driven development suggests that early signals of future trends may increasingly emerge from projects that were initially overlooked [10] Group 3: Strategic Considerations - While AI enhances execution efficiency, it does not necessarily improve the accuracy of strategic judgments, highlighting a potential limitation of mainline projects [10] - The evolving landscape indicates that side projects may play a crucial role in validating directions before scaling up to mainline initiatives [10]
梁文锋署名新论文,DeepSeek V4架构首曝?直击Transformer致命缺陷
3 6 Ke· 2026-01-13 01:24
Core Insights - DeepSeek's new paper introduces a novel approach to address the memory limitations of Transformer models by proposing a complementary "conditional memory" sparse axis through the Engram module, which enables efficient knowledge retrieval with near O(1) complexity [1][6][11]. Group 1: Memory and Model Architecture - The paper highlights that while MoE (Mixture of Experts) has become a mainstream architecture for large models, it fundamentally still relies on Transformers, which lack a native knowledge retrieval mechanism, leading to inefficient computation [9][11]. - Engram is designed to offload static, repetitive patterns in language modeling to a scalable lookup module, allowing the Transformer backbone to focus on more complex tasks requiring combination and reasoning [11][15]. - The authors categorize language modeling tasks into two types: those requiring combination and reasoning, and those resembling pattern retrieval, emphasizing the need for a dedicated mechanism for the latter [12][13]. Group 2: Engram Architecture and Functionality - Engram is conceptualized as a modernized version of classic hash N-gram, functioning as a scalable lookup module integrated within the Transformer architecture [18][20]. - The architecture includes a two-stage process for handling input sequences, focusing on retrieval and fusion, which enhances the model's efficiency in processing static patterns [20][21]. - The introduction of a context-aware gating mechanism allows the model to dynamically adjust its responses based on the retrieved embeddings, improving the overall expressiveness and reducing noise from hash collisions [25][27]. Group 3: Performance and Scaling - The paper presents a U-shaped scaling law indicating that an optimal resource allocation between MoE and Engram can enhance model performance, suggesting that a balance between dynamic computation and static memory is crucial [3][33]. - Experimental results show that Engram, when scaled to 27 billion parameters, outperforms the MoE baseline under equivalent parameter and FLOPs conditions, demonstrating its effectiveness in various benchmarks [5][38]. - Engram's architecture not only improves knowledge retrieval but also enhances reasoning, mathematics, and coding capabilities, indicating a significant leap in performance metrics across multiple tasks [39][48]. Group 4: Future Implications - The findings suggest a paradigm shift in model architecture towards a dual-axis approach of computation and memory, with potential integration into future iterations of large language models, such as V4 [46][50]. - The paper posits that the integration of Engram could lead to substantial improvements in model efficiency and capability, paving the way for more advanced applications in natural language processing [51][52].
刚刚,梁文锋署名开源「记忆」模块,DeepSeek V4更细节了
3 6 Ke· 2026-01-13 00:42
Core Insights - DeepSeek has released a new paper titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models," in collaboration with Peking University, introducing a new module called Engram to enhance the efficiency of large language models [1][3]. Group 1: Research Overview - The current approach to sparsity in large language models primarily relies on Mixture of Experts (MoE) for conditional computation, but existing Transformer architectures lack a native knowledge retrieval mechanism [3][8]. - DeepSeek proposes conditional memory as a complementary dimension to MoE, introducing the Engram module to facilitate efficient knowledge retrieval with O(1) time complexity [8][9]. Group 2: Engram Module Implementation - The Engram module has been implemented and made available on GitHub, allowing for community engagement and further development [4][5]. - Engram separates static memory storage from dynamic computation processes within the Transformer architecture, enhancing overall model performance [10][12]. Group 3: Performance Metrics - Engram has shown significant improvements in various benchmarks, including a +3.4% increase in MMLU accuracy and a +4.0% increase in CMMLU accuracy, as well as notable gains in general reasoning tasks [9][28]. - The architecture allows for better long-context retrieval capabilities, with accuracy in Multi-Query NIAH increasing from 84.2 to 97.0 [9]. Group 4: Experimental Results - DeepSeek trained four models: Dense-4B (4.1 billion parameters), MoE-27B (26.7 billion), Engram-27B (26.7 billion), and Engram-40B (39.5 billion), all under the same training conditions [25][27]. - The sparse architectures (MoE-27B, Engram-27B/40B) outperformed the dense model (Dense-4B) across all benchmarks, demonstrating superior scalability [28][30]. Group 5: Memory and Computation Decoupling - Engram's deterministic retrieval mechanism allows for the decoupling of parameter storage from computational resources, enabling efficient scaling without increasing computational costs [15][17]. - The architecture supports a multi-level cache hierarchy, optimizing memory access and reducing latency [18]. Group 6: U-Shaped Scaling Law - DeepSeek identified a U-shaped scaling law for optimal allocation between MoE and Engram, suggesting that a balanced distribution of sparse parameters leads to improved performance [19][24]. - The optimal allocation ratio was found to be around 20%-25% of the sparse parameter budget for Engram, confirming the structural complementarity between the two modules [23][24].
DeekSeek深夜再发梁文锋署名论文/追觅CEO称打造首个百万亿美金公司生态/iPhone官宣接入Gemini
Sou Hu Cai Jing· 2026-01-13 00:34
Group 1 - Apple and Google announced a multi-year partnership where the next-generation Apple foundational model will be built on Google's Gemini model and cloud technology, enhancing the AI capabilities of Siri and Apple Intelligence [3][4]. - Apple plans to pay approximately $1 billion annually for the use of Gemini technology, which is expected to significantly improve its AI functionalities while maintaining user data privacy [3][5]. - The collaboration is seen as a strategic move for Apple to gain time in the competitive landscape of large models, with Google benefiting from deeper integration into billions of Apple devices [4][5]. Group 2 - Counterpoint Research reported a 2% growth in global smartphone shipments in 2025, with Apple regaining the top position in market share at 20%, driven by strong sales of the iPhone 17 series [33][34]. - The report highlighted that the growth was primarily fueled by recovering demand in emerging markets and an improved economic environment [33]. Group 3 - The storage market has entered a "super bull market," with prices expected to rise by 50% this year due to increased demand from AI servers, significantly impacting the cost structure for smartphone and server manufacturers [85][86]. - Counterpoint's forecast indicates that storage prices surged by 40%-50% in Q4 of last year and are projected to continue rising in Q1 and Q2 of this year [86][88]. Group 4 - Bill Gates expressed optimism about the role of AI in driving key innovations over the next decade, particularly in climate, healthcare, and education, while also emphasizing the need for governance and regulation [94][95]. - Elon Musk suggested that advancements in AI, energy, and robotics will lead to a future where financial savings for retirement may become irrelevant, envisioning a world of abundant resources [97][98].
DeepSeek的资金后盾 梁文锋幻方量化2025收益率曝光
Feng Huang Wang· 2026-01-12 10:23
梁文锋 梁文锋此前曾表示,DeepSeek的研究经费来源于幻方量化的研发预算。(作者/箫雨) 幻方量化已成为梁文锋的"摇钱树"。他仍然持有这家资产管理公司的多数股权,并且在几年前就已停止 为旗下基金引入外部资金。幻方量化的强劲业绩表现有望为DeepSeek提供更多资金支持。DeepSeek是 由幻方量化在2023年孵化出来的AI公司,其多数股权同样由梁文锋持有。 "梁文锋现在无疑拥有更多资金来扩大团队规模,并为DeepSeek购置更多算力和硬件。当你的第一个创 业项目进展顺利时,你就能更好地孵化第二个项目。"上海弈泰私募基金管理有限公司投资总监李明鸿 表示。 李明鸿估算,若按1%的管理费和20%的绩效费计算,该基金去年的亮眼表现可能为其创造了超过7亿美 元的收入。这一数字比DeepSeek据称用于开发其震撼市场的AI模型的不足600万美元预算高出几个数量 级。 凤凰网科技讯北京时间1月12日,据彭博社报道,DeepSeek创始人梁文锋旗下量化对冲基金去年取得了 超过50%的收益率,进一步充实了DeepSeek的潜在资金储备。尽管DeepSeek的支出远低于竞争对手,但 它已撼动了全球科技格局。 根据深圳市排排 ...