DeepSeek
Search documents
梁文锋署名DeepSeek新论文,“突破GPU内存限制”
Guan Cha Zhe Wang· 2026-01-13 12:28
Core Insights - DeepSeek, a Chinese AI startup, has published a technical paper introducing a new model training technique that bypasses GPU memory limitations, highlighting its focus on cost efficiency despite existing gaps with leading US firms [1][2] - The new technique, termed "Engram," addresses the bottleneck of limited high-bandwidth memory (HBM) in scaling AI models, which is a significant gap between China and the US in AI hardware [3][4] - The paper has garnered attention from industry professionals in both China and the US, indicating DeepSeek's role as a leader in AI innovation over the past year [1][2] Technical Developments - The paper titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models" presents the "conditional memory" technology aimed at improving the efficiency of AI models when processing long contexts, a major challenge for AI chatbots [2][3] - The Engram technique allows for the decoupling of computation and storage, enhancing the model's ability to retrieve foundational information more efficiently [3][4] - Validation of this technology was conducted on a model with 27 billion parameters, showing performance improvements in key industry benchmarks [3] Market Position and Competition - DeepSeek's previous model, DeepSeek-R1, was trained in two months at a cost of $5.5 million, significantly lower than competitors like OpenAI, while achieving comparable performance [6][7] - Microsoft President Brad Smith has noted that US AI companies are being surpassed by Chinese competitors like DeepSeek, particularly in emerging markets due to the low-cost and user-friendly nature of Chinese open-source models [7] - Anticipation is building for DeepSeek's upcoming V4 model, expected to launch in mid-February, which is said to possess strong programming capabilities [8]
DeepSeek V4诞生前夜?梁文锋署名新论文发布
华尔街见闻· 2026-01-13 11:01
Core Viewpoint - The article discusses a groundbreaking paper by DeepSeek and Peking University that introduces a new module called Engram, which separates memory from computation in AI models, leading to a significant increase in reasoning capabilities [3][12]. Group 1: Introduction of Engram Module - DeepSeek's Engram module represents a supply-side reform in AI model architecture, allowing static knowledge to be stored separately from computational tasks, thus enhancing AI's reasoning abilities [3][14]. - The Engram module is inspired by the classic N-gram concept from natural language processing, modernized to allow for efficient retrieval of static knowledge with a time complexity of O(1) [15][16]. Group 2: Technical Innovations - Engram utilizes a large, scalable embedding table to store static knowledge, allowing for direct retrieval without complex computations, contrasting with traditional Transformer models where knowledge is embedded in weights [18]. - Three technical barriers were addressed: - A. Vocabulary compression reduced the effective vocabulary size by 23% through normalization of semantically similar terms [19]. - B. Multi-head hashing resolves hash collisions by mapping multiple N-grams to limited memory slots, enhancing robustness [20]. - C. Context-aware gating acts as a referee to filter out irrelevant static knowledge based on the current context [21][22]. Group 3: Resource Allocation and Model Performance - A large-scale ablation study revealed a U-shaped scaling law for resource allocation, indicating that the optimal distribution of parameters is approximately 75%-80% for Engram and 20%-25% for MoE, minimizing loss [30][31]. - The introduction of Engram not only improved knowledge tasks but also unexpectedly enhanced performance in logic, coding, and mathematics, with significant score increases across various benchmarks [39][40]. Group 4: Engineering Breakthroughs - Engram's architecture allows for a separation of memory and computation, enabling large models to offload memory to cheaper, scalable CPU resources, thus reducing reliance on expensive GPU memory [46][49]. - This separation allows for prefetching of memory data, maintaining high throughput even with large parameter sizes, which is a significant advantage for future AI model development [51][52]. Group 5: Future Implications - The upcoming DeepSeek V4 model is expected to integrate Engram technology, achieving a balance between computation and memory, enhancing both knowledge capacity and reasoning capabilities while reducing inference costs [61][64]. - The paper signals a shift in the AI industry towards architectural innovation, moving away from merely increasing computational power and parameters, and redefining competitive standards in AI development [65].
微软总裁炒作:争夺西方以外AI用户方面,美国公司正被中国竞争对手超越
Guan Cha Zhe Wang· 2026-01-13 10:51
Core Insights - Microsoft warns that U.S. AI companies are being surpassed by Chinese competitors, particularly in emerging markets, due to the advantages of low-cost open-source models [1][4] - The research indicates that DeepSeek's R1 model has significantly accelerated AI adoption in global southern countries, leading to a shift in market share towards China [1][5] Group 1: Competitive Landscape - Microsoft President Brad Smith highlights that China now possesses multiple competitive open-source models, contrasting with U.S. companies that maintain strict control over their advanced technologies [1][4] - DeepSeek has captured significant market shares in Africa, with 18% in Ethiopia and 17% in Zimbabwe, showcasing its rapid growth in these regions [3][4] - In countries where U.S. tech products are restricted, DeepSeek holds even larger market shares, such as 56% in Belarus and 49% in Cuba [4] Group 2: Market Dynamics - The application of AI is primarily concentrated in developed countries, with 25% of the population in global northern countries using AI compared to 14% in global southern countries [4] - Smith expresses concern over the widening AI gap, warning that it could exacerbate economic disparities between the global north and south [4] - The need for increased investment from international development banks and financial institutions is emphasized to build data centers and subsidize electricity costs in Africa [3][4] Group 3: Industry Reactions - OpenAI CEO Sam Altman acknowledges the competitive threat posed by DeepSeek and admits that OpenAI's closed strategy may have flaws [6] - Altman praises DeepSeek's latest model as "very good," indicating a potential shift in OpenAI's approach to open-source AI software [6]
微软急了:西方以外的市场,中国领先
Guan Cha Zhe Wang· 2026-01-13 10:30
Core Insights - Microsoft warns that U.S. AI companies are being surpassed by Chinese competitors in the race for users outside the West, with China's low-cost open-source models being a significant advantage [1][2] - Microsoft's research indicates that DeepSeek's R1 model has accelerated AI adoption in emerging markets, particularly in the Global South, allowing China to surpass the U.S. in the global market share of open-source AI models [1][2] - The competition is intensifying, with DeepSeek achieving significant market shares in countries like Ethiopia (18%) and Zimbabwe (17%) [3] Group 1 - Microsoft President Brad Smith emphasizes the need for international investment in African data centers to compete with heavily subsidized Chinese firms [3][4] - DeepSeek has gained substantial market shares in countries under U.S. sanctions, such as Belarus (56%), Cuba (49%), and Russia (43%) [4] - The application of AI is currently concentrated in developed countries, with only 14% of the population in Global South countries using AI compared to nearly a quarter in Global North countries [4][5] Group 2 - Smith warns that neglecting regions like Africa could lead to the emergence of AI systems that do not align with democratic values [5] - DeepSeek's R1 model was trained at a cost of $5.5 million, significantly lower than the expenses incurred by U.S. companies like OpenAI [5][6] - OpenAI's CEO Sam Altman acknowledges the potential flaws in the company's closed strategy and hints at a possible shift towards more open models in response to competition from DeepSeek [6]
Manus和它的「8000万名员工」
36氪· 2026-01-13 10:14
Core Insights - Manus represents a significant paradigm shift in AI applications, transitioning from content generation to autonomous task completion, marking a "DeepSeek moment" in the industry [5][6]. - The Manus model is characterized by three core values: it is the first company with over 80 million "employees," it functions as an "artificial intelligence operating system," and it signifies a potential leap in human civilization by enhancing productivity [7][8]. Manus Model and Its Impact - Manus has created over 80 million virtual computing instances, which are crucial for its operational model, allowing AI to autonomously handle complex tasks [10][11]. - The Manus model is compared to the mobile internet era, where cloud computing served as the backbone for numerous virtual machines operated by humans, whereas Manus utilizes AI to operate these virtual machines independently [11][12]. - The Manus system signifies a shift in core operators from humans to AI, indicating a potential 0.5-level leap in human civilization as AI takes over digital economy-related jobs [13][14]. AI Application's "DeepSeek Moment" - The release of Anthropic's multi-agent system demonstrated a 90.2% performance improvement in handling complex tasks compared to single-agent systems, highlighting the importance of collaboration among AI [15][19]. - The Manus architecture emphasizes a division of labor among AI agents, enhancing efficiency and enabling them to tackle complex problems collaboratively [17][21]. - Manus achieved an annual recurring revenue (ARR) of over $100 million within a year of launch, indicating strong commercial viability and interest in its offerings [21][22]. Technological Foundations of Multi-Agent Systems - Manus's multi-agent system relies on several core technologies, including virtual machines for secure execution environments and resource pooling for efficient utilization [25][26]. - The virtual machine architecture allows for isolated execution of tasks, addressing compatibility issues and ensuring data security [28][29]. - The intelligent orchestration of resources enables Manus to dynamically allocate models based on task complexity, significantly reducing token consumption [31][32]. Competitive Landscape and Industry Dynamics - Major tech companies are rapidly adopting multi-agent systems, recognizing their potential to enhance the capabilities of existing large models and redefine human-computer interaction [36][37]. - In the domestic market, companies like Alibaba, Tencent, and Baidu are exploring multi-agent systems, indicating a competitive environment for AI development [38][39]. - The emergence of new players like Kimi, which has secured significant funding to enhance multi-agent system development, suggests a growing interest and investment in this area [40]. Evolution of Human Roles in the AI Era - The relationship between humans and AI is evolving from "operator-tool" to "manager-team," with humans focusing on task design and oversight while AI handles execution [42][43]. - The automation of routine creative tasks by multi-agent systems may reduce demand for lower-level creative jobs while amplifying the value of higher-level creative work [43][44]. - The structural transformation of organizations is anticipated, with multi-agent systems enabling flatter hierarchies and redefining the ownership of production resources [44][45]. Challenges and Considerations - Data sovereignty and system security are critical concerns as multi-agent systems evolve, necessitating new frameworks for data ownership and quality assurance [46][47]. - The complexity of ensuring safety in multi-agent interactions poses significant challenges, requiring robust monitoring and validation mechanisms [49][50]. - The balance between security and efficiency remains a fundamental issue, as achieving absolute security may compromise system performance [50][51].
减持算个啥!马斯克+AI应用buff叠满,利欧股份8天5板杀疯了
Sou Hu Cai Jing· 2026-01-13 09:51
Group 1 - The core viewpoint of the article highlights the significant impact of Elon Musk's announcement regarding the open-source recommendation algorithm for the X platform, which has created a new investment opportunity in the A-share market, particularly benefiting companies like Liou Co., Ltd. [1] - Liou Co., Ltd. has transformed from a traditional water pump manufacturer to a player in digital marketing, now focusing on AI-driven solutions and AI-based content creation [1] - The company has experienced remarkable stock performance, achieving five consecutive trading limits within eight days and attracting substantial investment interest, with a peak order volume of 5.6 billion [1] Group 2 - The article emphasizes the dual advantage of Liou Co., Ltd. being positioned at the intersection of AI and GEO, which has led to increased investor enthusiasm and stock price surges [1] - The company's investment history is notable, with stakes in various tech firms, including DeepSeek and SpaceX, showcasing its strategic investment approach [1] - Despite the current excitement, there is a cautionary note regarding a potential selling pressure of 1 billion, which could affect the company's future performance depending on the execution of AI applications and GEO business [2]
DeepSeek开源Engram,如何做到推理损失仅3%?
Tai Mei Ti A P P· 2026-01-13 08:44
Core Insights - DeepSeek has launched a new module called Engram, which focuses on conditional memory for large language models, aiming to enhance efficiency and reduce computational costs [1][4] - The company emphasizes innovation in architecture and methodology to break through the constraints of computational costs, with Engram representing a restructuring of memory storage at the architectural level [4][6] Group 1: Engram Module - Engram is designed as a differentiable, trainable component that separates memory load from the main computation, allowing for efficient retrieval of frequently occurring knowledge [4][6] - The module utilizes deterministic retrieval based on N-grams and hash mapping to access vectors from a large static embedding table, significantly speeding up the process without complex neural computations [4][6] Group 2: Memory Functionality - Engram incorporates a lightweight gating mechanism to determine the appropriateness of retrieved memory for the current context, enhancing both memory retention and output coherence [6] - The architecture divides the model's capabilities into three independent yet collaborative dimensions: model depth for logical reasoning, computational sparsity represented by MoE, and storage sparsity introduced by Engram [6][7] Group 3: Performance and Future Developments - Testing indicates that even with a memory bank of up to 100 billion parameters, the inference throughput loss remains below 3% [7] - DeepSeek plans to release its latest V4 model around the Chinese New Year, which is expected to significantly improve performance in handling complex tasks and coding capabilities, potentially surpassing competitors like Anthropic [7]
DeepSeek母公司去年进账50亿,够烧2380个R1
量子位· 2026-01-13 07:21
Core Viewpoint - DeepSeek remains focused on AGI research without significant commercialization efforts, supported by substantial funding from its parent company, Huanfang Quantitative [2][35][41]. Group 1: Financial Performance of Huanfang Quantitative - Huanfang Quantitative earned approximately 50 billion RMB last year, indicating strong financial health [4][10]. - The average return rate for Huanfang Quantitative's funds in 2025 is projected to be over 55%, significantly outperforming the average return of 30.5% for quantitative funds in China [6][8]. - Huanfang Quantitative manages over 70 billion RMB in assets, contributing to its impressive profitability [9]. Group 2: DeepSeek's Research and Development - DeepSeek has maintained a steady output of high-level research papers, with the latest R1 paper showing a stable list of contributors [3][52]. - The development costs for DeepSeek's V3 and R1 models were relatively low, at 5.576 million USD and 294,000 USD respectively, allowing for extensive research funding from Huanfang Quantitative [15][16]. - With the substantial income from Huanfang Quantitative, DeepSeek can afford to develop numerous models without financial constraints [16][59]. Group 3: Competitive Landscape and Positioning - Unlike other major players like OpenAI, DeepSeek has not engaged in aggressive monetization strategies, focusing instead on pure AGI research [25][26]. - DeepSeek's approach contrasts with the commercialization efforts of competitors, allowing it to maintain a unique position in the AI landscape [24][49]. - The company benefits from a stable and committed research team, with minimal turnover, which is crucial in the competitive AI sector [51][57]. Group 4: Market Impact and Investor Sentiment - DeepSeek's technical papers have become valuable resources for investors, influencing stock prices of related companies in the semiconductor industry [60][66]. - The release of new models and technical reports has led to significant stock price movements, demonstrating the market's responsiveness to DeepSeek's advancements [70][72]. - Investors have found opportunities in the insights provided by DeepSeek, treating its research as a guide for investment decisions [61][72].
DeepSeek开源大模型记忆模块,梁文锋署名新论文,下一代稀疏模型提前剧透
3 6 Ke· 2026-01-13 07:14
Core Insights - DeepSeek has introduced a new paradigm called "Conditional Memory" to enhance the Transformer model's knowledge retrieval capabilities, which were previously lacking [1][4][31] - The Engram module allows for significant improvements in model efficiency, enabling simpler tasks to be completed with fewer layers, thus freeing up resources for more complex reasoning tasks [4][21] Group 1: Conditional Memory and Engram Module - The paper presents Conditional Memory as an essential modeling primitive for the next generation of sparse models [1][4] - Engram enables the model to perform tasks that previously required six layers of attention in just one or two layers, optimizing resource allocation [4][21] - The Engram design incorporates a large vocabulary for static knowledge retrieval, allowing for O(1) speed in information retrieval [4][6] Group 2: Performance and Efficiency - The optimal allocation of parameters between MoE (Mixture of Experts) and Engram memory was found to be around 20% to 25%, leading to a reduction in model validation loss [17][21] - In experiments, the Engram-27B model outperformed the MoE-27B model in various knowledge-intensive tasks, with notable improvements in general reasoning and code mathematics [21][22] - The Engram-40B model further increased memory parameters, showing sustained performance improvements and indicating that memory capacity had not yet saturated [25][31] Group 3: Hardware Optimization - The Engram module allows for the offloading of large parameter tables to CPU memory, minimizing inference delays and maintaining high throughput [29][30] - The design principle of "hardware-aware efficiency" enables the decoupling of storage and computation, facilitating the use of massive parameter tables without significant performance costs [31]
DeepSeek发布梁文锋署名新论文
券商中国· 2026-01-13 06:25
Group 1 - The article discusses a new paper released by DeepSeek on December 12, titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models," co-authored with Peking University [1] - The paper introduces the concept of conditional memory, which significantly enhances model performance in knowledge retrieval, reasoning, coding, and mathematical tasks under equal parameters and computational conditions [1] - DeepSeek has open-sourced a related memory module called Engram, which is part of the advancements discussed in the paper [1]