DeepSeek
Search documents
Manus和它的“8000万名员工”
虎嗅APP· 2026-01-13 00:49
Core Viewpoint - Manus represents a significant paradigm shift in AI applications, transitioning from merely generating content to autonomously completing tasks, marking a "DeepSeek moment" in the industry [6][7]. Group 1: Manus's Unique Model - Manus has created over 80 million virtual computer instances, which are crucial to its operational model, allowing AI to autonomously handle complex tasks [9][10]. - This model signifies a shift in core operators from humans to AI, establishing Manus as an "artificial intelligence operating system" [11]. - The Manus model is expected to lead to a 0.5-level leap in human civilization, as AI takes over digital economy-related jobs [12]. Group 2: AI Application's "DeepSeek Moment" - Manus achieved an annual recurring revenue (ARR) of over $100 million within a year, indicating its strong market performance [20]. - The introduction of multi-agent systems has shown a 90.2% performance improvement in handling complex tasks compared to single-agent systems, emphasizing the importance of collaboration among AI [14][17]. - The transition from AI as a tool to AI as a worker signifies a major evolution in AI applications, moving beyond the "toy" and "assistant" phases [20]. Group 3: Technological Foundations of Multi-Agent Systems - Manus's multi-agent system relies on several core technologies, including virtual machines for secure execution environments and resource pooling for efficient resource utilization [22][24]. - The virtual machine architecture allows for independent task execution, addressing safety and reliability issues in AI applications [25]. - Intelligent orchestration ensures optimal resource allocation and task management, enhancing overall system efficiency [26][27]. Group 4: Competitive Landscape and Industry Dynamics - Major tech companies are rapidly advancing in multi-agent systems, with Meta, Google, Microsoft, and Amazon all integrating these capabilities into their platforms [30][32]. - In the domestic market, companies like Alibaba, Tencent, and Baidu are also making significant strides in developing multi-agent technologies [31]. - The emergence of new players like Kimi, which has raised $500 million for multi-agent system development, indicates a growing competitive landscape [33]. Group 5: Evolution of Human Roles - The relationship between humans and AI is shifting from operator-tool dynamics to manager-team dynamics, where humans define tasks while AI executes them [35]. - This evolution will likely reduce the demand for lower and mid-level creative jobs while amplifying the value of high-level creative work [37]. - The traditional hierarchical structure of organizations may flatten as multi-agent systems can handle the entire workflow from strategy to execution [38]. Group 6: Underestimated Risks - Data ownership and system security are critical concerns in multi-agent systems, as data becomes a currency for AI collaboration and system evolution [40][41]. - The complexity of multi-agent systems introduces new security challenges, including process safety, collaboration safety, and evolution safety [42][43]. - Balancing security and efficiency remains a fundamental challenge, as overly secure systems may hinder performance while efficient systems may expose vulnerabilities [44]. Group 7: Irreversible Development Path - The proliferation of Manus's 80 million virtual machines signals a new era of productivity, redefining the nature of work itself [47]. - In the short term, vertical applications of multi-agent systems are expected to explode across various industries, leading to intense market competition [48]. - Over the long term, human-AI collaboration will evolve into a more integrated system, blurring the lines between human and machine contributions [49].
刚刚,梁文锋署名开源「记忆」模块,DeepSeek V4更细节了
3 6 Ke· 2026-01-13 00:42
Core Insights - DeepSeek has released a new paper titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models," in collaboration with Peking University, introducing a new module called Engram to enhance the efficiency of large language models [1][3]. Group 1: Research Overview - The current approach to sparsity in large language models primarily relies on Mixture of Experts (MoE) for conditional computation, but existing Transformer architectures lack a native knowledge retrieval mechanism [3][8]. - DeepSeek proposes conditional memory as a complementary dimension to MoE, introducing the Engram module to facilitate efficient knowledge retrieval with O(1) time complexity [8][9]. Group 2: Engram Module Implementation - The Engram module has been implemented and made available on GitHub, allowing for community engagement and further development [4][5]. - Engram separates static memory storage from dynamic computation processes within the Transformer architecture, enhancing overall model performance [10][12]. Group 3: Performance Metrics - Engram has shown significant improvements in various benchmarks, including a +3.4% increase in MMLU accuracy and a +4.0% increase in CMMLU accuracy, as well as notable gains in general reasoning tasks [9][28]. - The architecture allows for better long-context retrieval capabilities, with accuracy in Multi-Query NIAH increasing from 84.2 to 97.0 [9]. Group 4: Experimental Results - DeepSeek trained four models: Dense-4B (4.1 billion parameters), MoE-27B (26.7 billion), Engram-27B (26.7 billion), and Engram-40B (39.5 billion), all under the same training conditions [25][27]. - The sparse architectures (MoE-27B, Engram-27B/40B) outperformed the dense model (Dense-4B) across all benchmarks, demonstrating superior scalability [28][30]. Group 5: Memory and Computation Decoupling - Engram's deterministic retrieval mechanism allows for the decoupling of parameter storage from computational resources, enabling efficient scaling without increasing computational costs [15][17]. - The architecture supports a multi-level cache hierarchy, optimizing memory access and reducing latency [18]. Group 6: U-Shaped Scaling Law - DeepSeek identified a U-shaped scaling law for optimal allocation between MoE and Engram, suggesting that a balanced distribution of sparse parameters leads to improved performance [19][24]. - The optimal allocation ratio was found to be around 20%-25% of the sparse parameter budget for Engram, confirming the structural complementarity between the two modules [23][24].
DeepSeek开源大模型记忆模块!梁文锋署名新论文,下一代稀疏模型提前剧透
量子位· 2026-01-13 00:39
Core Insights - The article discusses the introduction of "Conditional Memory" in Transformer models, which enhances knowledge retrieval mechanisms that were previously lacking in the original architecture [1][2][9]. Group 1: Introduction of Conditional Memory - Conditional Memory is viewed as an essential modeling primitive for the next generation of sparse models [2]. - The research team, led by Liang Wenfeng in collaboration with Peking University, has proposed a new paradigm and implementation plan called the Engram module [3][5]. Group 2: Performance Improvements - The Engram module allows a 27B parameter model to outperform a pure MoE model of the same size, compressing tasks that originally required 6 layers of attention down to 1-2 layers, thus freeing resources for more complex reasoning tasks [5][13]. - The optimal allocation of sparse parameters between MoE and Engram memory results in a U-shaped curve, indicating that allocating about 20% to 25% of sparse parameters to Engram memory minimizes model validation loss [34][36]. Group 3: Technical Implementation - Engram's design incorporates a large vocabulary for static entities and phrases, enabling O(1) speed for information retrieval [7][14]. - The team addresses traditional N-gram model issues, such as semantic redundancy and storage explosion, by compressing tokens and using multiple hash functions to map N-grams to a fixed-size embedding table [22][25]. Group 4: Experimental Results - The Engram-27B model shows significant improvements across various benchmarks, with notable increases in performance metrics such as BBH, ARC-Challenge, and DROP [47]. - The model's architecture allows for efficient memory management, enabling the use of a 100 billion parameter table offloaded to CPU memory without significant latency impact during inference [63][66]. Group 5: Future Developments - The next generation of sparse models from DeepSeek is expected to be released before the Spring Festival, indicating ongoing advancements in AI model architecture [67].
刚刚,梁文锋署名开源「记忆」模块,DeepSeek V4更细节了
机器之心· 2026-01-13 00:12
Core Insights - DeepSeek has introduced a new research paper titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models," in collaboration with Peking University, focusing on enhancing large language models (LLMs) through a novel approach to memory and computation [1][2]. Group 1: Research Background and Problem Statement - Current large language models primarily utilize Mixture of Experts (MoE) for sparsity, known as "conditional computation," but lack an inherent knowledge retrieval mechanism, leading to inefficient simulation of retrieval behavior [2][8]. - DeepSeek proposes "conditional memory" as a complementary approach to MoE, introducing a new module called Engram to address this limitation [3][8]. Group 2: Engram Module and Its Implementation - The Engram module has been made available on GitHub, allowing for community engagement and further development [4]. - Engram modernizes classic n-gram embeddings to achieve knowledge retrieval in O(1) time complexity, enhancing the efficiency of memory access [8][10]. - The module separates static knowledge storage from dynamic computation processes, enhancing the overall architecture of the Transformer network [12][14]. Group 3: Performance and Efficiency - DeepSeek has expanded Engram to a scale of 27 billion parameters, demonstrating significant performance improvements over pure MoE baseline models under equivalent parameter and FLOPs conditions [10][37]. - Engram has shown notable gains in knowledge retrieval tasks, with improvements such as +3.4 in MMLU and +4.0 in CMMLU, as well as enhanced general reasoning capabilities [10][37]. - The architecture allows for efficient memory access without additional performance overhead, supporting prefetching from host memory during runtime [11][18]. Group 4: Sparsity Distribution and Optimal Allocation - DeepSeek formalized a U-shaped expansion rule to characterize the optimal trade-off between neural computation (MoE) and static memory (Engram) [9][22]. - The research indicates that a balanced allocation of approximately 20%-25% of sparse parameter budget to Engram yields optimal performance, confirming the structural complementarity between the two modules [27][29]. Group 5: Experimental Results - Four models were trained: Dense-4B, MoE-27B, Engram-27B, and Engram-40B, all under identical training conditions [34][35]. - Sparse architectures consistently outperformed the dense model across various benchmarks, with Engram-27B achieving significant improvements over MoE-27B in multiple tasks [37]. - Engram-40B further reduced pre-training loss and improved performance on most benchmarks, indicating that memory capacity has not yet reached saturation [38]. Group 6: Long Context Training - Engram's architecture has been validated for its structural advantages in long-context tasks, demonstrating significant performance gains in global context retention [40][41]. - Controlled experiments revealed that Engram outperforms MoE in complex retrieval tasks, showcasing its inherent architectural superiority [45].
刚刚,DeepSeek 突发梁文峰署名新论文:V4 新架构提前曝光?
AI前线· 2026-01-12 22:41
Core Insights - DeepSeek has released a significant technological achievement by open-sourcing a new paper and module called Engram, which introduces a "lookup-computation separation" mechanism to enhance the performance of large language models in various tasks [2][5]. Summary by Sections Introduction of Engram - Engram is a scalable, lookup-based memory module designed to improve the efficiency of language models by separating memory retrieval from computational tasks [10][18]. Need for Engram - Traditional large language models rely on Transformer and Mixture-of-Experts (MoE) architectures, which combine memory and computation in a way that can lead to inefficiencies. Engram aims to address this by allowing models to handle factual memory and logical reasoning separately [8][9]. Core Technology of Engram - Engram utilizes modernized hashed N-gram embeddings, allowing for O(1) time complexity in memory retrieval, which significantly reduces computational costs while maintaining high retrieval speed [11][13]. Relationship with MoE - Engram provides a new axis of sparsity that complements MoE by offering static memory retrieval capabilities, thus optimizing parameter efficiency. In a 27 billion parameter model, Engram can utilize a large number of parameters for memory while consuming minimal computational resources during inference [15][16]. Performance Metrics - Engram has shown improved performance metrics across various benchmarks, such as achieving a loss of 1.950 on the Pile dataset and an accuracy of 60.4% on MMLU with 5-shot learning, outperforming both Dense and MoE models [17]. Community Reception - The Engram technology has received positive feedback from the community, with users highlighting its potential to separate memory pattern retrieval from neural computation, marking a new direction in model architecture design [18][19][21]. Future Implications - Observers speculate that Engram will be a core component of DeepSeek's upcoming V4 model, indicating a significant architectural advancement in memory and reasoning collaboration [22][23].
美股异动 | 金山云(KC.US)盘前涨逾15% DeepSeek-V4有望引发新一轮AI应用热潮
Zhi Tong Cai Jing· 2026-01-12 14:04
Core Viewpoint - Kingsoft Cloud (KC.US) experienced a pre-market increase of over 15%, reaching $12.71, driven by news regarding DeepSeek's upcoming release of its next-generation flagship model V4, which is reported to have superior coding capabilities compared to mainstream models like Claude and ChatGPT [1] Group 1: Company Developments - DeepSeek is set to launch its new flagship model V4 in the coming weeks, which is expected to enhance programming capabilities significantly [1] - Initial test results from DeepSeek indicate that V4 outperforms leading models in code generation [1] Group 2: Industry Trends - CITIC Securities highlights the current dynamic developments in the AI industry, noting recent financing activities by overseas companies such as xAI and Anthropic [1] - The introduction of domestic policies promoting "AI + manufacturing" is expected to stimulate growth in the sector [1] - The recent stock surges of companies like Zhipu AI and MiniMax following their listings suggest a growing interest and investment in AI applications, with the upcoming launch of DeepSeek-V4 likely to trigger a new wave of AI application enthusiasm [1]
幻方量化去年狂赚57%,跻身百亿级量化基金业绩榜第二!
Hua Er Jie Jian Wen· 2026-01-12 11:20
Group 1 - The core viewpoint of the articles highlights the strong performance of Huansheng Quantitative, which ranks second among China's quantitative funds with an average return rate of 56.6% in 2025, providing substantial financial support for its incubated AI company, DeepSeek [1][2] - Huansheng Quantitative's impressive performance is attributed to its strategic transformation, shifting focus from market-neutral strategies to a pure long-only product line aimed at outperforming stock benchmark indices, which has become the core driver of its growth [2] - The overall performance of the Chinese quantitative industry in 2025 is also noteworthy, with an average return rate of 30.5%, significantly higher than the global average, and a notable increase in the number of quantitative fund companies managing over 5 billion RMB [3] Group 2 - The company, under the leadership of its founder Liang Wenfeng, has ceased accepting external funds, maintaining majority ownership while leveraging its strong financial position to support DeepSeek's research and development [1][2] - The average return of products managed by co-founder Xu Jin reached 58.6%, while CEO Lu Zheng's products averaged a 56% increase, with Lu Zheng's stock strategy achieving a Sharpe ratio of 2.8, ranking first among leading quantitative institutions [2] - The rapid expansion of the quantitative fund industry is evidenced by the increase in the number of firms managing over 5 billion RMB, rising from 63 to 91 within a year, reflecting a trend towards concentration in management scale [3]
股指连阳,“春季躁动”背后的逻辑
李迅雷金融与投资· 2026-01-12 11:15
Group 1 - The core narrative of the market has shifted from "growth" to "competitiveness," driven by external factors such as the U.S.-China tech competition and the need for self-sufficiency in key industries [17][33] - The A-share market has shown resilience despite economic pressures, with the performance of leading companies in global competition being a key driver of market valuation rather than domestic consumption or income growth [18][27] - The current investment logic emphasizes sectors like AI, power, and critical resources, which are experiencing rapid capital expenditure growth, while traditional consumer sectors face challenges [26][28] Group 2 - The divergence between corporate competitiveness and household income growth reflects a broader transformation in the economic structure, where companies are optimizing costs to enhance global competitiveness [28][31] - Historical examples illustrate that market performance can diverge from economic fundamentals, as seen in the U.S. during WWII and China's market in the early 2000s, where investor sentiment and risk premiums played significant roles [9][14][16] - The current market environment suggests that the valuation of leading companies is increasingly decoupled from traditional economic indicators, focusing instead on their long-term competitive advantages [5][8][18] Group 3 - The rise in valuations for sectors like commercial aerospace, AI, and semiconductors reflects a belief in China's ability to compete and innovate in critical areas, despite short-term economic challenges [18][19] - The shift in demand dynamics, particularly in the context of AI and energy infrastructure, is driving a new cycle of investment that differs from traditional recovery patterns [19][24] - The market's focus on a few core assets, which contribute significantly to overall market capitalization, indicates a concentration of value creation in leading firms rather than a broad-based economic recovery [5][8][18] Group 4 - The ongoing adjustments in corporate cost structures and labor compensation models are indicative of a strategic response to global competition, which may lead to increased income volatility for workers [28][31] - The institutional differences between China's centralized policy approach and the more fragmented Western model highlight the advantages of sustained support for key industries in fostering long-term competitiveness [33][34] - The current macroeconomic landscape is characterized by a complex interplay of geopolitical uncertainty, technological competition, and evolving consumer behaviors, necessitating a nuanced investment strategy [35][36]
A股成交额创新高!三大指数均涨超1%
Jin Rong Shi Bao· 2026-01-12 10:47
Market Performance - The A-share market continued its upward trend on January 12, with the Shanghai Composite Index, Shenzhen Component Index, and ChiNext Index rising by 1.09%, 1.75%, and 1.82% respectively [1] - The total market turnover reached 3.64 trillion yuan, marking the second consecutive trading day above 3 trillion yuan, setting a new historical record [1][2] Sector Performance - The AI application sector experienced significant growth, with the Wande Internet Index and Software Index increasing by 9.81% and 7.75% respectively, and over 20 stocks, including Tianrun Technology and Zhongcheng Technology, hitting the daily limit [2][5] - The commercial aerospace sector also showed strong performance, with stocks like Guobo Electronics and Ligong Navigation seeing daily limit increases [6] AI Industry Developments - Domestic and international AI dynamics are intensifying, with major financing activities reported, including xAI's completion of a Series E funding round raising $20 billion (approximately 140 billion yuan) [5] - Domestic large model companies, such as Zhipu and MiniMax, have recently listed on the Hong Kong Stock Exchange, with stock price increases of 80% and 141% respectively [5] - The AI application is in an accelerated penetration phase, supported by government policies aimed at enhancing digital infrastructure and increasing the penetration rate of intelligent agents to 70% by 2027 [6] Technological Advancements - Recent breakthroughs in domestic large models are expected to enhance programming capabilities and support longer context windows, facilitating the deployment of intelligent agents in complex scenarios [6] - Upcoming AI platforms, such as NVIDIA's "Rubin platform" and AMD's "Helios" platform, are set to advance AI computing capabilities [5]
DeepSeek的资金后盾 梁文锋幻方量化2025收益率曝光
Feng Huang Wang· 2026-01-12 10:23
Group 1 - DeepSeek's founder Liang Wenfeng's quantitative hedge fund achieved over 50% return last year, enhancing DeepSeek's potential funding reserves [1] - According to data from Shenzhen Paipai Network Investment Management Co., the average return of funds under Huansheng Quantitative is 56.6% in 2025, managing over 70 billion RMB (approximately 10 billion USD) in assets [1] - Huansheng Quantitative ranks second among Chinese quantitative funds managing over 10 billion RMB, only behind Ningbo Lingjun Investment Management, which leads with over 70% return [1] Group 2 - Liang Wenfeng's strong performance at Huansheng Quantitative is expected to provide more funding support for DeepSeek, which was incubated by Huansheng Quantitative in 2023 [1] - The successful performance of the fund may generate over 700 million USD in revenue based on a 1% management fee and 20% performance fee, significantly exceeding DeepSeek's reported budget of less than 6 million USD for developing its AI model [2] - DeepSeek's research funding comes from Huansheng Quantitative's R&D budget, as previously stated by Liang Wenfeng [3]