Seek .(SKLTY)
Search documents
DeepSeek发布梁文锋署名新论文
Zheng Quan Shi Bao· 2026-01-13 03:02
Core Insights - DeepSeek released a new paper titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models" on the evening of the 12th [1] - The paper was co-authored by Peking University and DeepSeek, with Liang Wenfeng listed as a co-author [1] - The concept of conditional memory is introduced, which significantly enhances model performance in knowledge retrieval, reasoning, coding, and mathematical tasks under equal parameters and computational conditions [1] - DeepSeek has also open-sourced a related memory module named Engram [1] Company and Industry Summary - The collaboration between DeepSeek and Peking University highlights the growing trend of partnerships between academia and industry in advancing AI technologies [1] - The introduction of scalable lookup structures in large language models represents a significant innovation in the field, potentially leading to improved efficiency and effectiveness in AI applications [1] - The open-sourcing of the Engram memory module may encourage further research and development in conditional memory systems, fostering a more collaborative environment in AI advancements [1]
DeepSeek等8大产品都是意外?! 改变世界的项目们,最初都没被“当个事儿办”
Sou Hu Cai Jing· 2026-01-13 01:47
Core Insights - Many groundbreaking products initially started as side projects, which were not considered significant at their inception [1][2][3][5][6] - Side projects are defined as non-core, non-KPI driven initiatives that are not part of a company's strategic plan [1] - The success of side projects can be attributed to their ability to operate without the constraints typically associated with mainline projects, allowing for greater innovation and flexibility [2][3][6] Group 1: Examples of Successful Side Projects - DeepSeek, a side project of Huansquare Quantitative, emerged from internal technical research and has become a significant tool in quantitative trading [2] - Qwen, developed by Alibaba, was initially a side project that allowed for more autonomy and faster iteration, ultimately leading to its integration into the company's main offerings [3] - Claude Code, initially a simple experimental project by an engineer, evolved into a key product for Anthropic, demonstrating the potential of side projects to gain traction unexpectedly [5] Group 2: Impact of AI on Project Development - The integration of AI into software engineering has lowered the cost of experimentation, enabling individuals to validate ideas more quickly and easily [7][8] - Side projects often begin by addressing specific problems and evolve through real-world usage, which enhances their maturity and relevance [8] - The shift towards AI-driven development suggests that early signals of future trends may increasingly emerge from projects that were initially overlooked [10] Group 3: Strategic Considerations - While AI enhances execution efficiency, it does not necessarily improve the accuracy of strategic judgments, highlighting a potential limitation of mainline projects [10] - The evolving landscape indicates that side projects may play a crucial role in validating directions before scaling up to mainline initiatives [10]
梁文锋署名新论文,DeepSeek V4架构首曝?直击Transformer致命缺陷
3 6 Ke· 2026-01-13 01:24
Core Insights - DeepSeek's new paper introduces a novel approach to address the memory limitations of Transformer models by proposing a complementary "conditional memory" sparse axis through the Engram module, which enables efficient knowledge retrieval with near O(1) complexity [1][6][11]. Group 1: Memory and Model Architecture - The paper highlights that while MoE (Mixture of Experts) has become a mainstream architecture for large models, it fundamentally still relies on Transformers, which lack a native knowledge retrieval mechanism, leading to inefficient computation [9][11]. - Engram is designed to offload static, repetitive patterns in language modeling to a scalable lookup module, allowing the Transformer backbone to focus on more complex tasks requiring combination and reasoning [11][15]. - The authors categorize language modeling tasks into two types: those requiring combination and reasoning, and those resembling pattern retrieval, emphasizing the need for a dedicated mechanism for the latter [12][13]. Group 2: Engram Architecture and Functionality - Engram is conceptualized as a modernized version of classic hash N-gram, functioning as a scalable lookup module integrated within the Transformer architecture [18][20]. - The architecture includes a two-stage process for handling input sequences, focusing on retrieval and fusion, which enhances the model's efficiency in processing static patterns [20][21]. - The introduction of a context-aware gating mechanism allows the model to dynamically adjust its responses based on the retrieved embeddings, improving the overall expressiveness and reducing noise from hash collisions [25][27]. Group 3: Performance and Scaling - The paper presents a U-shaped scaling law indicating that an optimal resource allocation between MoE and Engram can enhance model performance, suggesting that a balance between dynamic computation and static memory is crucial [3][33]. - Experimental results show that Engram, when scaled to 27 billion parameters, outperforms the MoE baseline under equivalent parameter and FLOPs conditions, demonstrating its effectiveness in various benchmarks [5][38]. - Engram's architecture not only improves knowledge retrieval but also enhances reasoning, mathematics, and coding capabilities, indicating a significant leap in performance metrics across multiple tasks [39][48]. Group 4: Future Implications - The findings suggest a paradigm shift in model architecture towards a dual-axis approach of computation and memory, with potential integration into future iterations of large language models, such as V4 [46][50]. - The paper posits that the integration of Engram could lead to substantial improvements in model efficiency and capability, paving the way for more advanced applications in natural language processing [51][52].
刚刚,梁文锋署名开源「记忆」模块,DeepSeek V4更细节了
3 6 Ke· 2026-01-13 00:42
Core Insights - DeepSeek has released a new paper titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models," in collaboration with Peking University, introducing a new module called Engram to enhance the efficiency of large language models [1][3]. Group 1: Research Overview - The current approach to sparsity in large language models primarily relies on Mixture of Experts (MoE) for conditional computation, but existing Transformer architectures lack a native knowledge retrieval mechanism [3][8]. - DeepSeek proposes conditional memory as a complementary dimension to MoE, introducing the Engram module to facilitate efficient knowledge retrieval with O(1) time complexity [8][9]. Group 2: Engram Module Implementation - The Engram module has been implemented and made available on GitHub, allowing for community engagement and further development [4][5]. - Engram separates static memory storage from dynamic computation processes within the Transformer architecture, enhancing overall model performance [10][12]. Group 3: Performance Metrics - Engram has shown significant improvements in various benchmarks, including a +3.4% increase in MMLU accuracy and a +4.0% increase in CMMLU accuracy, as well as notable gains in general reasoning tasks [9][28]. - The architecture allows for better long-context retrieval capabilities, with accuracy in Multi-Query NIAH increasing from 84.2 to 97.0 [9]. Group 4: Experimental Results - DeepSeek trained four models: Dense-4B (4.1 billion parameters), MoE-27B (26.7 billion), Engram-27B (26.7 billion), and Engram-40B (39.5 billion), all under the same training conditions [25][27]. - The sparse architectures (MoE-27B, Engram-27B/40B) outperformed the dense model (Dense-4B) across all benchmarks, demonstrating superior scalability [28][30]. Group 5: Memory and Computation Decoupling - Engram's deterministic retrieval mechanism allows for the decoupling of parameter storage from computational resources, enabling efficient scaling without increasing computational costs [15][17]. - The architecture supports a multi-level cache hierarchy, optimizing memory access and reducing latency [18]. Group 6: U-Shaped Scaling Law - DeepSeek identified a U-shaped scaling law for optimal allocation between MoE and Engram, suggesting that a balanced distribution of sparse parameters leads to improved performance [19][24]. - The optimal allocation ratio was found to be around 20%-25% of the sparse parameter budget for Engram, confirming the structural complementarity between the two modules [23][24].
DeekSeek深夜再发梁文锋署名论文/追觅CEO称打造首个百万亿美金公司生态/iPhone官宣接入Gemini
Sou Hu Cai Jing· 2026-01-13 00:34
Group 1 - Apple and Google announced a multi-year partnership where the next-generation Apple foundational model will be built on Google's Gemini model and cloud technology, enhancing the AI capabilities of Siri and Apple Intelligence [3][4]. - Apple plans to pay approximately $1 billion annually for the use of Gemini technology, which is expected to significantly improve its AI functionalities while maintaining user data privacy [3][5]. - The collaboration is seen as a strategic move for Apple to gain time in the competitive landscape of large models, with Google benefiting from deeper integration into billions of Apple devices [4][5]. Group 2 - Counterpoint Research reported a 2% growth in global smartphone shipments in 2025, with Apple regaining the top position in market share at 20%, driven by strong sales of the iPhone 17 series [33][34]. - The report highlighted that the growth was primarily fueled by recovering demand in emerging markets and an improved economic environment [33]. Group 3 - The storage market has entered a "super bull market," with prices expected to rise by 50% this year due to increased demand from AI servers, significantly impacting the cost structure for smartphone and server manufacturers [85][86]. - Counterpoint's forecast indicates that storage prices surged by 40%-50% in Q4 of last year and are projected to continue rising in Q1 and Q2 of this year [86][88]. Group 4 - Bill Gates expressed optimism about the role of AI in driving key innovations over the next decade, particularly in climate, healthcare, and education, while also emphasizing the need for governance and regulation [94][95]. - Elon Musk suggested that advancements in AI, energy, and robotics will lead to a future where financial savings for retirement may become irrelevant, envisioning a world of abundant resources [97][98].
DeepSeek的资金后盾 梁文锋幻方量化2025收益率曝光
Feng Huang Wang· 2026-01-12 10:23
Group 1 - DeepSeek's founder Liang Wenfeng's quantitative hedge fund achieved over 50% return last year, enhancing DeepSeek's potential funding reserves [1] - According to data from Shenzhen Paipai Network Investment Management Co., the average return of funds under Huansheng Quantitative is 56.6% in 2025, managing over 70 billion RMB (approximately 10 billion USD) in assets [1] - Huansheng Quantitative ranks second among Chinese quantitative funds managing over 10 billion RMB, only behind Ningbo Lingjun Investment Management, which leads with over 70% return [1] Group 2 - Liang Wenfeng's strong performance at Huansheng Quantitative is expected to provide more funding support for DeepSeek, which was incubated by Huansheng Quantitative in 2023 [1] - The successful performance of the fund may generate over 700 million USD in revenue based on a 1% management fee and 20% performance fee, significantly exceeding DeepSeek's reported budget of less than 6 million USD for developing its AI model [2] - DeepSeek's research funding comes from Huansheng Quantitative's R&D budget, as previously stated by Liang Wenfeng [3]
2025 AI 年度复盘:读完200篇论文,看DeepMind、Meta、DeepSeek ,中美巨头都在描述哪种AGI叙事
3 6 Ke· 2026-01-12 08:44
Core Insights - The article discusses the evolution of artificial intelligence (AI) in 2025, highlighting a shift from merely increasing model parameters to enhancing model intelligence through foundational research in areas like fluid reasoning, long-term memory, spatial intelligence, and meta-learning [2][4]. Group 1: Technological Advancements - In 2025, significant technological progress was observed in fluid reasoning, long-term memory, spatial intelligence, and meta-learning, driven by the diminishing returns of scaling laws in AI models [2][3]. - The bottleneck in current AI technology lies in the need for models to not only possess knowledge but also to think and remember effectively, revealing a significant imbalance in AI capabilities [2][4]. - The introduction of Test-Time Compute revolutionized reasoning capabilities, allowing AI to engage in deeper, more thoughtful processing during inference [6][10]. Group 2: Memory and Learning Enhancements - The Titans architecture and Nested Learning emerged as breakthroughs in memory capabilities, enabling models to update their parameters in real-time during inference, thus overcoming the limitations of traditional transformer models [19][21]. - Memory can be categorized into three types: context as memory, RAG-processed context as memory, and internalized memory through parameter integration, with significant advancements in RAG and parameter adjustment methods [19][27]. - The introduction of sparse memory fine-tuning and on-policy distillation methods has mitigated the issue of catastrophic forgetting, allowing models to retain old knowledge while integrating new information [31][33]. Group 3: Spatial Intelligence and World Models - The development of spatial intelligence and world models was marked by advancements in video generation models, such as Genie 3, which demonstrated improved physical understanding and consistency in generated environments [35][36]. - The emergence of the World Labs initiative, led by Stanford professor Fei-Fei Li, focused on generating 3D environments based on multimodal inputs, showcasing a more structured approach to AI-generated content [44][46]. - The V-JEPA 2 model introduced by Meta emphasized predictive learning, allowing models to grasp physical rules through prediction rather than mere observation, enhancing their understanding of causal relationships [50][51]. Group 4: Reinforcement Learning Innovations - Reinforcement learning (RL) saw significant advancements with the rise of verifiable rewards and sparse reward metrics, leading to improved performance in areas like mathematics and coding [11][12]. - The GPRO algorithm gained popularity, simplifying the RL process by eliminating the need for a critic model, thus reducing computational costs while maintaining effectiveness [15][16]. - The exploration of RL's limitations revealed a ceiling effect, indicating that while RL can enhance existing model capabilities, further breakthroughs will require innovations in foundational models or algorithm architectures [17][18].
DeepSeek下一代AI 模型V4有望发布,低费率云计算ETF华夏(516630)涨超6%规模再创新高
Xin Lang Cai Jing· 2026-01-12 06:31
招商证券表示,建议投资者围绕产业趋势最为明确的AI、政策驱动最为确定的信创以及最为受益"牛 市"Beta 的金融科技坚定布局。资料显示,云计算ETF华夏(516630)跟踪云计算指数(930851),费 率最低。该指数聚焦国产AI软硬件算力,算机软件+云服务+计算机设备合计权重高达83.7%,deep seek、AI应用含量均超40%。场外联接(A类:019868;C类:019869) 1月12日,AI+方向集体沸腾,截至13:35,低费率云计算ETF华夏(516630)上涨6.47%, 冲击3连涨,持 仓股拓尔思、汉得信息、易点天下20cm涨停,万兴科技,中科星图等个股跟涨。拉长时间看,截至 2026年1月9日,云计算ETF华夏近1周累计上涨9.33%。 消息面上,DeepSeek将于2月发布新一代旗舰AI模型DeepSeek V4,该模型具备强大的编程能力,预计 将对当前的AI竞争格局产生重大影响。V4是DeepSeek继2024年12月发布的V3模型之后的最新版本。知 情人士称,DeepSeek内部的初步测试表明,V4在编程能力上超过了目前市场上的其他顶级模型,如 Anthropic的Claude和Op ...
DeepSeek V4大模型被曝春节前后发布!科创人工智能ETF华夏(589010) 放量大涨4.33%,持仓股掀起涨停潮
Mei Ri Jing Ji Xin Wen· 2026-01-12 06:00
Group 1 - The core viewpoint of the news highlights the strong performance of the Sci-Tech Innovation Artificial Intelligence ETF (589010), which surged by 4.33%, indicating robust market sentiment towards AI investments [1][2] - The ETF's trading volume exceeded 252 million yuan, with a turnover rate of over 8%, reflecting high trading enthusiasm and recognition of long-term value in the AI sector [1] - Key constituent stocks such as New Point Software and Hai Tian Rui Sheng experienced significant gains, with New Point Software hitting a 20% limit up and several others rising over 15%, showcasing excellent profit potential [1] Group 2 - Open Source Securities noted that AI innovation is continuously evolving, with model capabilities improving and costs decreasing, particularly with the rise of Chinese open-source models like DeepSeek and Qwen [2] - The development of multi-modal large models is experiencing rapid breakthroughs, which is expected to further enhance application growth in the AI industry [2] - The Sci-Tech Innovation Artificial Intelligence ETF closely tracks the Shanghai Stock Exchange Sci-Tech Innovation Board AI Index, covering high-quality enterprises across the entire industry chain, benefiting from high R&D investment and policy support [2]
ETF盘中资讯|DeepSeek V4有哪些突破?科创人工智能ETF华宝(589520)跳空大涨,暴拉5%!AI应用端涨势猛烈!
Jin Rong Jie· 2026-01-12 03:50
Core Viewpoint - The AI application sector continues to show strong momentum, with significant gains in the domestic AI industry chain, particularly highlighted by the performance of the Huabao AI ETF (589520) which surged by 4.55% [1][4]. Group 1: Market Performance - The Huabao AI ETF experienced a jump of 5% during intraday trading, marking its third consecutive day of gains [1]. - Key stocks in the AI sector saw substantial increases, with Zhongke Xingtou rising by 16.07%, Hehe Information by 15.32%, and others like Haitaian Ruisheng and Xinghuan Technology also showing strong performance [2][7]. Group 2: Industry Dynamics - The AI industry is witnessing a surge in activity, with significant financing rounds for overseas companies like xAI and Anthropic, and the introduction of domestic policies promoting "AI + manufacturing" [2][3]. - The upcoming release of DeepSeek's next-generation V4 model is anticipated to enhance programming capabilities and improve data pattern understanding, potentially reshaping the global AI competitive landscape [3]. Group 3: Future Outlook - Analysts predict that 2026 will be a "golden year" for AI applications, driven by technological maturity, supportive policies, and market demand [4]. - The domestic large model industry is transitioning from a phase of technological catch-up to systematic layout and ecosystem construction, with expectations of achieving leadership in certain areas by 2026 [4]. Group 4: Investment Opportunities - The Huabao AI ETF focuses on a balanced allocation across application software, terminal applications, terminal chips, and cloud chips, reflecting a shift from reliance on foreign technology to self-sufficiency [4][5]. - The ETF is positioned as an efficient tool for investing in domestic computing power, with a high concentration in semiconductor stocks [5].