Workflow
稀疏性
icon
Search documents
DeepSeek:基于可扩展查找的条件记忆大型语言模型稀疏性的新维度技术,2026报告
Core Insights - The article discusses a new architecture called "Engram" proposed by a research team from Peking University and DeepSeek-AI, which aims to enhance the capabilities of large language models (LLMs) by introducing a complementary dimension of "conditional memory" alongside existing "mixture of experts" (MoE) models [2][3]. Group 1: Model Architecture and Performance - The core argument of the report is that language modeling involves two distinct sub-tasks: combinatorial reasoning and knowledge retrieval, with the latter often being static and local [3]. - The Engram architecture modernizes the N-gram concept into a "conditional memory" mechanism, allowing for direct retrieval of static embeddings with O(1) time complexity, thus freeing up computational resources for higher-order reasoning tasks [3][4]. - A significant finding is the "sparsity distribution law," which indicates that a balanced allocation of approximately 20% to 25% of sparse parameter budgets to the Engram module can significantly reduce validation loss while maintaining computational costs [4]. Group 2: Efficiency and Scalability - The Engram model (Engram-27B) outperformed a baseline MoE model (MoE-27B) in various knowledge-intensive and logic-intensive tasks, demonstrating its effectiveness in enhancing model intelligence [4][5]. - Engram's deterministic retrieval mechanism allows for the unloading of large models into host memory, significantly reducing the dependency on GPU memory and enabling the deployment of ultra-large models with limited hardware resources [6][7]. - The architecture's ability to utilize a multi-level cache structure based on the Zipfian distribution of natural language knowledge can greatly benefit cloud service providers and enterprises aiming to reduce deployment costs [7]. Group 3: Long Context Processing - Engram shows structural advantages in handling long contexts by directly addressing many local dependencies, thus allowing the Transformer model to focus on capturing global long-range dependencies [8]. - In long-text benchmark tests, Engram-27B demonstrated a significant accuracy improvement from 84.2% to 97.0% in multi-query retrieval tasks, indicating enhanced efficiency and optimized attention allocation [8]. Group 4: Future Implications - The research signifies a shift in the design philosophy of large models from merely increasing computational depth to a dual-sparsity approach that incorporates both computation and memory [9]. - The introduction of conditional memory is expected to become a standard configuration for the next generation of sparse models, providing high performance and low-cost solutions for trillion-parameter models [9].
DeepSeek论文披露全新模型机制,SSD等存储需求有望再进一步,龙头还发布炸裂业绩
Xuan Gu Bao· 2026-01-13 23:24
Group 1 - DeepSeek introduced a new paper proposing "conditional memory" as a new dimension of sparsity to optimize large language models through the Engram module [1] - The existing Transformer architecture lacks a native knowledge retrieval mechanism, leading to inefficient simulation of retrieval behavior [1] - Conditional memory complements the MoE (Mixture of Experts) approach and significantly enhances model performance in knowledge retrieval, reasoning, coding, and mathematical tasks under equal parameters and computational conditions [1] Group 2 - The Engram module is a large, scalable embedding table that acts as an external memory for Transformers, allowing for efficient retrieval of nearby content [2] - Engram caches frequently accessed embeddings in faster storage mediums while storing less frequently accessed data in larger, slower storage, maintaining low access latency [2] - The NAND industry is expected to have limited capital expenditure over the next two years, with leading manufacturers likely to focus on HBM rather than NAND, while AI applications are anticipated to drive SSD demand [2] Group 3 - Baiwei Storage forecasts a net profit of 850 million to 1 billion yuan for the year, representing a year-on-year growth of 427.19% to 520.22% [2] - Jiangbolong has launched several high-speed enterprise-level eSSD products, covering mainstream capacities from 480GB to 7.68TB [3]
打破显存墙:谢赛宁团队提出CLM,单卡RTX 4090「撬动」1亿高斯点
机器之心· 2025-11-11 08:40
Core Insights - 3D Gaussian Splatting (3DGS) is an emerging method for novel view synthesis that utilizes a set of images with poses to iteratively train a scene representation composed of numerous anisotropic 3D Gaussian bodies, capturing the appearance and geometry of the scene [2][4] - The CLM system proposed by the team allows 3DGS to render large scenes using a single consumer-grade GPU, such as the RTX 4090, by addressing GPU memory limitations [6][8] Group 1: 3DGS Overview - 3DGS has shown revolutionary application potential in fields such as 3D modeling, digital twins, visual effects (VFX), VR/AR, and robot vision reconstruction (SLAM) [5] - The quality of images rendered using 3DGS depends on the fidelity of the trained scene representation, with larger and more complex scenes requiring more Gaussian bodies, leading to increased memory usage [5] Group 2: CLM System Design - CLM is designed based on the insight that the computation of 3DGS is inherently sparse, allowing only a small subset of Gaussian bodies to be accessed during each training iteration [8][20] - The system employs a novel unloading strategy that minimizes performance overhead and scales to large scenes by dynamically loading only the necessary Gaussian bodies into GPU memory while offloading the rest to CPU memory [8][11] Group 3: Performance and Efficiency - The implementation of CLM can render a large scene requiring 102 million Gaussian bodies on a single RTX 4090 while achieving top-tier reconstruction quality [8] - Each view typically accesses only 0.39% of the Gaussian points, with a maximum of 1.06% for any single view, highlighting the sparse nature of the data [23] Group 4: Optimization Techniques - The team utilized several unique characteristics of 3DGS to significantly reduce communication overhead associated with unloading, including pre-computing the accessed Gaussian sets for each view and leveraging spatial locality to optimize data transfer between CPU and GPU [12][17] - The microbatch scheduling optimization allows for overlapping access patterns between consecutive batches, enhancing cache hit rates and reducing redundant data transfers [24][25] Group 5: Results and Impact - CLM enhances the training capacity of 3DGS models by up to 6.1 times compared to pure GPU training baselines, enabling the training of larger models that improve scene reconstruction accuracy while lowering communication and unloading overhead [27]