Seek .(SKLTY)
Search documents
DeepSeek母公司去年进账50亿,够烧2380个R1
3 6 Ke· 2026-01-13 13:02
Core Insights - DeepSeek has not engaged in new financing or significant commercialization activities despite the buzz surrounding large model players in the market [1] - DeepSeek continues to produce high-quality research papers, indicating a stable output of academic contributions [2] - The financial success of its parent company, Huanfang Quantitative, which earned approximately $7 billion last year, provides substantial funding for DeepSeek's research endeavors [6][8] Group 1: Financial Performance - Huanfang Quantitative's funds are showing impressive returns, with nearly all of its funds projected to yield over 55% in 2025 [3] - The average return for quantitative funds in China last year was 30.5%, significantly outperforming global competitors [4] - Huanfang Quantitative's asset management exceeds $70 billion, contributing to its substantial earnings [7] Group 2: Research and Development - DeepSeek's research expenditures are relatively low, with the latest V3 training costing $557,600 and R1 costing $29,400, allowing for the potential production of numerous models with available funds [6] - DeepSeek has maintained a focus on AGI research without the pressure of immediate financial returns, as it has not accepted external funding and is not tied to any major tech company [11][15] - The company has consistently released significant research outputs, including recent advancements in OCR and V3.2, while also open-sourcing components like the memory module [9][10] Group 3: Market Position and Strategy - DeepSeek operates with a unique business model that allows it to focus solely on AGI without the distractions of monetization pressures [10][12] - The company benefits from a stable and committed research team, with minimal turnover and even some returning members, indicating a strong internal culture [28][30] - DeepSeek's research outputs have become valuable to investors, as its technical papers provide insights that influence stock movements in related hardware companies [34][39] Group 4: Competitive Landscape - Compared to other major players like OpenAI, DeepSeek's approach is characterized by a lack of aggressive monetization strategies, focusing instead on pure research [26][9] - The ability to leverage a mature business model for cross-subsidization of AI research is often underestimated in the market [19][20] - DeepSeek's model integrates the strengths of both established companies and pure AI startups, positioning it uniquely in the competitive landscape [26]
梁文锋署名DeepSeek最新论文,提出新方法突破GPU内存限制
Xin Lang Cai Jing· 2026-01-13 12:33
Core Viewpoint - DeepSeek, a Chinese AI startup, has developed a new model training technique that bypasses GPU memory limitations, enhancing cost efficiency and performance in AI model training [1][3]. Group 1: Technology and Innovation - DeepSeek and researchers from Peking University introduced a "conditional memory" technique called "Engram" to address the limitations of high bandwidth memory (HBM) in scaling AI models [3][4]. - The Engram technology allows for more efficient retrieval of foundational information by decoupling computation from storage, improving the model's performance in handling long contexts [4][6]. - In a model with 27 billion parameters, the new technique improved performance on key industry benchmarks by several percentage points, preserving capacity for complex reasoning tasks [4][6]. Group 2: Competitive Landscape - The HBM gap between China and the US is significant, with Chinese storage chip manufacturers lagging behind their US and South Korean counterparts [4]. - DeepSeek's previous model, DeepSeek-R1, was trained in two months at a cost of $5.5 million, significantly lower than the expenses incurred by US companies like OpenAI, while achieving comparable performance [6][7]. - Microsoft President Brad Smith highlighted that Chinese companies like DeepSeek are rapidly gaining ground in the global AI market, particularly in emerging markets, due to their low-cost open-source models [7]. Group 3: Future Developments - Anticipation is building for DeepSeek's upcoming V4 model, expected to launch in mid-February, which is said to possess strong programming capabilities [7].
梁文锋署名DeepSeek新论文,“突破GPU内存限制”
Guan Cha Zhe Wang· 2026-01-13 12:28
Core Insights - DeepSeek, a Chinese AI startup, has published a technical paper introducing a new model training technique that bypasses GPU memory limitations, highlighting its focus on cost efficiency despite existing gaps with leading US firms [1][2] - The new technique, termed "Engram," addresses the bottleneck of limited high-bandwidth memory (HBM) in scaling AI models, which is a significant gap between China and the US in AI hardware [3][4] - The paper has garnered attention from industry professionals in both China and the US, indicating DeepSeek's role as a leader in AI innovation over the past year [1][2] Technical Developments - The paper titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models" presents the "conditional memory" technology aimed at improving the efficiency of AI models when processing long contexts, a major challenge for AI chatbots [2][3] - The Engram technique allows for the decoupling of computation and storage, enhancing the model's ability to retrieve foundational information more efficiently [3][4] - Validation of this technology was conducted on a model with 27 billion parameters, showing performance improvements in key industry benchmarks [3] Market Position and Competition - DeepSeek's previous model, DeepSeek-R1, was trained in two months at a cost of $5.5 million, significantly lower than competitors like OpenAI, while achieving comparable performance [6][7] - Microsoft President Brad Smith has noted that US AI companies are being surpassed by Chinese competitors like DeepSeek, particularly in emerging markets due to the low-cost and user-friendly nature of Chinese open-source models [7] - Anticipation is building for DeepSeek's upcoming V4 model, expected to launch in mid-February, which is said to possess strong programming capabilities [8]
DeepSeek开源Engram,如何做到推理损失仅3%?
Tai Mei Ti A P P· 2026-01-13 08:44
Core Insights - DeepSeek has launched a new module called Engram, which focuses on conditional memory for large language models, aiming to enhance efficiency and reduce computational costs [1][4] - The company emphasizes innovation in architecture and methodology to break through the constraints of computational costs, with Engram representing a restructuring of memory storage at the architectural level [4][6] Group 1: Engram Module - Engram is designed as a differentiable, trainable component that separates memory load from the main computation, allowing for efficient retrieval of frequently occurring knowledge [4][6] - The module utilizes deterministic retrieval based on N-grams and hash mapping to access vectors from a large static embedding table, significantly speeding up the process without complex neural computations [4][6] Group 2: Memory Functionality - Engram incorporates a lightweight gating mechanism to determine the appropriateness of retrieved memory for the current context, enhancing both memory retention and output coherence [6] - The architecture divides the model's capabilities into three independent yet collaborative dimensions: model depth for logical reasoning, computational sparsity represented by MoE, and storage sparsity introduced by Engram [6][7] Group 3: Performance and Future Developments - Testing indicates that even with a memory bank of up to 100 billion parameters, the inference throughput loss remains below 3% [7] - DeepSeek plans to release its latest V4 model around the Chinese New Year, which is expected to significantly improve performance in handling complex tasks and coding capabilities, potentially surpassing competitors like Anthropic [7]
DeepSeek开源大模型记忆模块,梁文锋署名新论文,下一代稀疏模型提前剧透
3 6 Ke· 2026-01-13 07:14
Core Insights - DeepSeek has introduced a new paradigm called "Conditional Memory" to enhance the Transformer model's knowledge retrieval capabilities, which were previously lacking [1][4][31] - The Engram module allows for significant improvements in model efficiency, enabling simpler tasks to be completed with fewer layers, thus freeing up resources for more complex reasoning tasks [4][21] Group 1: Conditional Memory and Engram Module - The paper presents Conditional Memory as an essential modeling primitive for the next generation of sparse models [1][4] - Engram enables the model to perform tasks that previously required six layers of attention in just one or two layers, optimizing resource allocation [4][21] - The Engram design incorporates a large vocabulary for static knowledge retrieval, allowing for O(1) speed in information retrieval [4][6] Group 2: Performance and Efficiency - The optimal allocation of parameters between MoE (Mixture of Experts) and Engram memory was found to be around 20% to 25%, leading to a reduction in model validation loss [17][21] - In experiments, the Engram-27B model outperformed the MoE-27B model in various knowledge-intensive tasks, with notable improvements in general reasoning and code mathematics [21][22] - The Engram-40B model further increased memory parameters, showing sustained performance improvements and indicating that memory capacity had not yet saturated [25][31] Group 3: Hardware Optimization - The Engram module allows for the offloading of large parameter tables to CPU memory, minimizing inference delays and maintaining high throughput [29][30] - The design principle of "hardware-aware efficiency" enables the decoupling of storage and computation, facilitating the use of massive parameter tables without significant performance costs [31]
梁文锋署名DeepSeek新论文发布,直指大模型“记忆”短板
Bei Ke Cai Jing· 2026-01-13 04:41
Core Insights - The paper published by DeepSeek addresses the memory limitations of current large language models and introduces the concept of "conditional memory" [2] - DeepSeek proposes a module named Engram, which breaks down language modeling tasks into two branches: "static pattern retrieval" for quick access to deterministic knowledge and "dynamic combinatorial reasoning" for complex logical operations [2] - The paper suggests that conditional memory is an essential modeling primitive for the next generation of sparse models, with speculation that DeepSeek's next model may be released before the Spring Festival [3] Group 1 - The paper titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models" was co-authored by Peking University and DeepSeek [1] - The introduction of "conditional memory" aims to enhance the memory capabilities of large language models [2] - The Engram module is designed to improve efficiency in language modeling by separating tasks into static and dynamic components [2] Group 2 - The paper emphasizes the importance of conditional memory for future sparse model development [3] - There are speculations regarding the release of DeepSeek's next-generation model around the Spring Festival, potentially replicating the success of previous launches [3]
DeepSeek V4路线图隐现?梁文锋署名重磅论文发布,聚焦大模型条件记忆模块
Jin Rong Jie· 2026-01-13 04:38
Core Insights - DeepSeek has released a significant research paper focusing on the conditional memory module for large models, indicating it will be a core modeling primitive in the next generation of sparse large models [1][4] - The upcoming flagship model V4 is expected to be unveiled around the Spring Festival, with the recent research results potentially outlining its core research roadmap [1][4] Summary by Sections Research Findings - The paper titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models" was co-authored by DeepSeek and Peking University, with DeepSeek's founder Liang Wenfeng among the authors [4] - The core insight of the paper is that large models handle two distinct types of tasks: deep dynamic computation for combinatorial reasoning and static knowledge retrieval [4] - Existing Transformer architectures lack a native knowledge retrieval mechanism, leading to inefficient computation when simulating retrieval processes [4] Proposed Solutions - To address these inefficiencies, DeepSeek proposes the use of conditional memory as a supplementary dimension of sparsity, implemented through a module called Engram [5] - The team discovered a "U-shaped scaling law," indicating that a mixed sparse capacity allocation between MoE experts and Engram memory significantly outperforms pure MoE baseline models [5] - The Engram module is designed to optimize the balance between neural computation (MoE) and static memory, allowing for improved efficiency and performance in various domains, including general reasoning, coding, and mathematics [5] Future Developments - DeepSeek plans to release the next-generation flagship model V4 in February, with preliminary internal tests showing its programming capabilities surpass existing top models [6] - The V4 model is anticipated to be a focal point in the industry, especially following the success of the V3 model released at the end of 2024, which outperformed OpenAI's GPT-5 and Google's Gemini 3.0 Pro in several benchmark tests [6]
梁文锋署名,DeepSeek论文上新
Di Yi Cai Jing Zi Xun· 2026-01-13 03:41
Core Insights - DeepSeek has released a new paper focusing on the conditional memory module of large models, suggesting it will be a core modeling primitive in the next generation of sparse large models [2][5][7] Group 1: Research and Development - The new paper, co-authored with Peking University, is titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models" [5] - The research identifies two distinct tasks within large models: deep dynamic computation for combinatorial reasoning and static knowledge retrieval, highlighting inefficiencies in the current Transformer architecture [5][6] - DeepSeek introduces conditional memory as a supplementary sparse dimension to optimize the balance between neural computation (MoE) and static memory (Engram) [6][7] Group 2: Performance and Implications - The team discovered a U-shaped scaling law indicating that the mixed sparse capacity allocation between MoE experts and Engram memory significantly outperforms pure MoE baseline models [6] - The introduction of the memory module not only aids knowledge retrieval but also shows significant improvements in general reasoning, coding, and mathematical tasks [6][7] - The paper essentially proposes a "division of labor" optimization for large models, allowing specialized modules to handle specific tasks more efficiently [6][7] Group 3: Future Developments - Industry speculation suggests that the proposed conditional memory may be part of the technical architecture for DeepSeek's upcoming flagship model, DeepSeek V4, expected to be released around February [7] - Initial tests indicate that V4 may surpass other leading models in programming capabilities, with the previous V3 model having already outperformed OpenAI's GPT-5 and Google's Gemini 3.0 Pro in various benchmarks [7]
DeepSeek论文上新!下一代大模型实现“记忆分离”,V4不远了?
Di Yi Cai Jing Zi Xun· 2026-01-13 03:32
Core Insights - DeepSeek has released a new paper focusing on the conditional memory module of large models, suggesting it will be a core modeling primitive in the next generation of sparse large models [1][4]. Group 1: Research Findings - The new paper, co-authored with Peking University, is titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models" and highlights the need for a native knowledge retrieval mechanism in existing Transformer architectures [4]. - The research identifies two distinct tasks in large models: deep dynamic computation for combinatorial reasoning and static knowledge retrieval, indicating that current models inefficiently simulate retrieval processes [4][5]. - DeepSeek introduces conditional memory as a supplementary dimension of sparsity, optimizing the trade-off between mixture of experts (MoE) and static memory (Engram) [4][6]. Group 2: Performance Improvements - The team discovered a U-shaped scaling law, showing that the mixed sparse capacity allocation between MoE experts and Engram memory significantly outperforms pure MoE baseline models [5]. - The introduction of the memory module not only aids knowledge retrieval but also yields notable improvements in general reasoning, coding, and mathematical tasks [5][6]. - The paper essentially proposes a "division of labor" optimization for large models, allowing specialized modules to handle specific tasks, thereby enhancing efficiency and resource allocation [6]. Group 3: Future Developments - Industry speculation suggests that the proposed conditional memory may be integral to the architecture of DeepSeek's upcoming flagship model, DeepSeek V4, expected to be released around February [6]. - Initial tests indicate that V4 may surpass other leading models in programming capabilities, with the previous model, V3, having already outperformed OpenAI's GPT-5 and Google's Gemini 3.0 Pro in various benchmarks [6].
DeepSeek-V4 即将发布,算力效率与性能双升级!低费率云计算ETF华夏、创业板人工智能ETF华夏获资金抢筹
Xin Lang Cai Jing· 2026-01-13 03:32
Group 1 - The three major indices turned negative again, with the technology sector adjusting alongside the market. The communication ETF Huaxia (515050) saw its decline expand to 2.39%, with mixed performance among its holdings [1] - The low-fee创业板人工智能 ETF Huaxia (159381) dropped by 1.64%, with trading volume quickly surpassing 300 million yuan, indicating active capital trading [1] - The low-fee cloud computing ETF Huaxia (516630) fell by 0.64%, but has seen a continuous net inflow of over 130 million yuan in the past three trading days, suggesting accelerated capital allocation [1] Group 2 - Ping An Securities noted that global AI computing platform capabilities are continuously improving, with major chip manufacturers like NVIDIA and AMD showcasing advancements in AI computing chips at CES 2026 [2] - NVIDIA announced the full production of the NVIDIARubin platform, with products based on Rubin expected to be available through partners in the second half of 2026 [2] - AMD unveiled the "Helios" platform and provided a preview of the next-generation MI500 series GPU, indicating a significant enhancement in global AI computing infrastructure [2] Group 3 - The cloud computing ETF Huaxia (516630) tracks the cloud computing index (930851) and has the lowest fee rate, focusing on domestic AI software and hardware computing, with a combined weight of 83.7% in computer software, cloud services, and computer equipment [3] - The创业板人工智能 ETF Huaxia (159381) supports investment in AI-focused companies on the创业板, with half of its weight in AI hardware computing and the other half in AI software applications, showcasing high elasticity and representativeness [3] - The communication ETF Huaxia (515050) tracks the CSI 5G communication theme index, deeply focusing on the supply chains of NVIDIA, Apple, and Huawei, with its top five holdings including Zhongji Xuchuang, Xinyi Sheng, Lixun Precision, Industrial Fulian, and Zhaoyi Innovation [3]