稀疏大模型
Search documents
梁文锋署名,DeepSeek论文上新
Di Yi Cai Jing Zi Xun· 2026-01-13 03:41
2026.01.13 本文字数:1017,阅读时长大约2分钟 作者 |第一财经 刘晓洁 继去年底发布一篇新论文后,1月12日晚,DeepSeek又上新了一篇论文,这次聚焦的是大模型的条件记 忆模块,在结论中DeepSeek 认为,这将成为下一代稀疏大模型中不可或缺的核心建模原语。 此次发布的论文是DeepSeek与北京大学合作完成的,名称为《Conditional Memory via Scalable Lookup:A New Axis of Sparsity for Large Language Models》(《基于条件查找的条件记忆:大型语言模型稀疏性 的新维度》),作者一列同样有DeepSeek创始人梁文锋的署名。 这篇论文的核心观察是,大模型包含两种性质完全不同的任务,一种是需要深度动态计算的组合推理, 另一种则是检索静态知识。而现有的Transformer架构缺乏原生的知识查找机制,只能通过计算低效地模 拟检索过程。例如模型查找不变的知识时,得浪费算力重新推导一遍,既费时间又占资源。 为解决这一问题,DeepSeek团队引入了条件记忆作为补充的稀疏性维度,并通过Engram这一条件记忆 模块实现 ...
DeepSeek论文上新!下一代大模型实现“记忆分离”,V4不远了?
Di Yi Cai Jing Zi Xun· 2026-01-13 03:32
此前有爆料称DeepSeek下一代大模型V4将在春节前后发布,结合这几次研究,业内猜测这或许就是 DeepSeek V4的研究路线图。 此次发布的论文是DeepSeek与北京大学合作完成的,名称为《Conditional Memory via Scalable Lookup:A New Axis of Sparsity for Large Language Models》(《基于条件查找的条件记忆:大型语言模型稀疏性 的新维度》),作者一列同样有DeepSeek创始人梁文锋的署名。 这篇论文的核心观察是,大模型包含两种性质完全不同的任务,一种是需要深度动态计算的组合推理, 另一种则是检索静态知识。而现有的Transformer架构缺乏原生的知识查找机制,只能通过计算低效地模 拟检索过程。例如模型查找不变的知识时,得浪费算力重新推导一遍,既费时间又占资源。 继去年底发布一篇新论文后,1月12日晚,DeepSeek又上新了一篇论文,这次聚焦的是大模型的条件记 忆模块,在结论中DeepSeek 认为,这将成为下一代稀疏大模型中不可或缺的核心建模原语。 团队还发现了U型缩放定律,表明 MoE 专家和 Engram 记 ...
梁文锋署名新论文,DeepSeek V4架构首曝?直击Transformer致命缺陷
3 6 Ke· 2026-01-13 01:24
刚刚 ,DeepSeek新论文发布了,梁文锋署名! 这一次,他们联手北大直接瞄准了「记忆」,是Transformer最致命的关键难题。 如今,MoE成为大模型主流架构,但本质仍是Transformer,因其缺少原生「知识查找」机制,很多检索能力被迫用大量计算去模拟。 33页论文中,团队提出了 MoE 互补的「条件记忆」稀疏轴,并通过一种全新的Engram模块去实现: 将经典哈希N-gram嵌入现代化,提供近似O(1)的确定性知识查找。 论文地址:https://github.com/deepseek-ai/Engram/blob/main/Engram_paper.pdf 通过「稀疏分配」(Sparsity Allocation)建模,他们意外发现MoE与Engram之间,存在「U形scaling law」。 这意味着,需调整两者之间资源比例,让计算与静态记忆间找到最优权衡。 沿着这个规律,将Engram扩展到27B参数后,并在严格等参数、等FLOPs下优于MoE基线。 直白讲,MoE只解决「怎么少算」,Engram直接解决「别瞎算」。 它把该查的交给 O(1)记忆,把注意力从局部琐碎中解救出来,结果不只是更 ...
刚刚,梁文锋署名开源「记忆」模块,DeepSeek V4更细节了
3 6 Ke· 2026-01-13 00:42
Core Insights - DeepSeek has released a new paper titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models," in collaboration with Peking University, introducing a new module called Engram to enhance the efficiency of large language models [1][3]. Group 1: Research Overview - The current approach to sparsity in large language models primarily relies on Mixture of Experts (MoE) for conditional computation, but existing Transformer architectures lack a native knowledge retrieval mechanism [3][8]. - DeepSeek proposes conditional memory as a complementary dimension to MoE, introducing the Engram module to facilitate efficient knowledge retrieval with O(1) time complexity [8][9]. Group 2: Engram Module Implementation - The Engram module has been implemented and made available on GitHub, allowing for community engagement and further development [4][5]. - Engram separates static memory storage from dynamic computation processes within the Transformer architecture, enhancing overall model performance [10][12]. Group 3: Performance Metrics - Engram has shown significant improvements in various benchmarks, including a +3.4% increase in MMLU accuracy and a +4.0% increase in CMMLU accuracy, as well as notable gains in general reasoning tasks [9][28]. - The architecture allows for better long-context retrieval capabilities, with accuracy in Multi-Query NIAH increasing from 84.2 to 97.0 [9]. Group 4: Experimental Results - DeepSeek trained four models: Dense-4B (4.1 billion parameters), MoE-27B (26.7 billion), Engram-27B (26.7 billion), and Engram-40B (39.5 billion), all under the same training conditions [25][27]. - The sparse architectures (MoE-27B, Engram-27B/40B) outperformed the dense model (Dense-4B) across all benchmarks, demonstrating superior scalability [28][30]. Group 5: Memory and Computation Decoupling - Engram's deterministic retrieval mechanism allows for the decoupling of parameter storage from computational resources, enabling efficient scaling without increasing computational costs [15][17]. - The architecture supports a multi-level cache hierarchy, optimizing memory access and reducing latency [18]. Group 6: U-Shaped Scaling Law - DeepSeek identified a U-shaped scaling law for optimal allocation between MoE and Engram, suggesting that a balanced distribution of sparse parameters leads to improved performance [19][24]. - The optimal allocation ratio was found to be around 20%-25% of the sparse parameter budget for Engram, confirming the structural complementarity between the two modules [23][24].
华为发布天才少年AI挑战课题,汇聚全球智慧共探科技前沿
Sou Hu Cai Jing· 2025-06-17 19:01
Core Insights - Huawei has launched the "Genius Challenge" to attract global talent in five key areas: intelligent connectivity & computing, fundamental research and innovation, intelligent terminals, cloud computing, and intelligent vehicles [3][4][5][6] Group 1: Intelligent Connectivity & Computing - The challenge includes research on autonomous intelligent wireless communication architecture and key technologies to meet future communication demands [3] - It also focuses on the key technologies of the Ascend reinforcement learning system to enhance performance [3] - Research on AI cluster all-optical switching networks aims to improve data transmission speed and efficiency for large-scale AI computing [3] Group 2: Fundamental Research & Innovation - Key technologies for large model security are being explored to address safety risks in current applications [4] - Research on intelligent imaging/editing technology aims to achieve breakthroughs for enhanced user visual experiences [4] - The design and optimization of training cluster architecture will improve the efficiency and quality of model training [4] Group 3: Intelligent Terminals - The challenge includes research on world models to help intelligent terminals better understand and simulate physical laws [5] - It aims to enhance personalization and memory capabilities for intelligent terminals [5] - Research on multimedia algorithms based on computer vision and multimodal understanding is also included [5] Group 4: Cloud Computing - Research on generalizable embodied intelligent operation technology seeks to enable cloud AI to control physical devices [6] - The challenge includes exploring core technologies for the digital-native era [6] - AI-based next-generation cloud network infrastructure research aims to build advanced cloud network systems [6] Group 5: Intelligent Vehicles - The challenge focuses on training and optimizing large models for intelligent vehicles [6] - Research on advanced autonomous driving models is part of the initiative [6] - The development of collaborative control technologies for vehicle chassis aims to enhance safety and comfort [6] Group 6: R&D Investment and Talent Development - Huawei's R&D expenditure for 2024 is projected to reach 179.7 billion yuan, accounting for approximately 20.8% of total revenue [7] - Over the past decade, Huawei has invested more than 1.249 trillion yuan in R&D [7] - The "Genius Challenge" reflects Huawei's commitment to fundamental research and innovation, emphasizing the importance of active participation in basic research [7]