Seek .(SKLTY)
Search documents
刚刚,梁文锋署名开源「记忆」模块,DeepSeek V4更细节了
3 6 Ke· 2026-01-13 00:42
Core Insights - DeepSeek has released a new paper titled "Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models," in collaboration with Peking University, introducing a new module called Engram to enhance the efficiency of large language models [1][3]. Group 1: Research Overview - The current approach to sparsity in large language models primarily relies on Mixture of Experts (MoE) for conditional computation, but existing Transformer architectures lack a native knowledge retrieval mechanism [3][8]. - DeepSeek proposes conditional memory as a complementary dimension to MoE, introducing the Engram module to facilitate efficient knowledge retrieval with O(1) time complexity [8][9]. Group 2: Engram Module Implementation - The Engram module has been implemented and made available on GitHub, allowing for community engagement and further development [4][5]. - Engram separates static memory storage from dynamic computation processes within the Transformer architecture, enhancing overall model performance [10][12]. Group 3: Performance Metrics - Engram has shown significant improvements in various benchmarks, including a +3.4% increase in MMLU accuracy and a +4.0% increase in CMMLU accuracy, as well as notable gains in general reasoning tasks [9][28]. - The architecture allows for better long-context retrieval capabilities, with accuracy in Multi-Query NIAH increasing from 84.2 to 97.0 [9]. Group 4: Experimental Results - DeepSeek trained four models: Dense-4B (4.1 billion parameters), MoE-27B (26.7 billion), Engram-27B (26.7 billion), and Engram-40B (39.5 billion), all under the same training conditions [25][27]. - The sparse architectures (MoE-27B, Engram-27B/40B) outperformed the dense model (Dense-4B) across all benchmarks, demonstrating superior scalability [28][30]. Group 5: Memory and Computation Decoupling - Engram's deterministic retrieval mechanism allows for the decoupling of parameter storage from computational resources, enabling efficient scaling without increasing computational costs [15][17]. - The architecture supports a multi-level cache hierarchy, optimizing memory access and reducing latency [18]. Group 6: U-Shaped Scaling Law - DeepSeek identified a U-shaped scaling law for optimal allocation between MoE and Engram, suggesting that a balanced distribution of sparse parameters leads to improved performance [19][24]. - The optimal allocation ratio was found to be around 20%-25% of the sparse parameter budget for Engram, confirming the structural complementarity between the two modules [23][24].
DeekSeek深夜再发梁文锋署名论文/追觅CEO称打造首个百万亿美金公司生态/iPhone官宣接入Gemini
Sou Hu Cai Jing· 2026-01-13 00:34
Group 1 - Apple and Google announced a multi-year partnership where the next-generation Apple foundational model will be built on Google's Gemini model and cloud technology, enhancing the AI capabilities of Siri and Apple Intelligence [3][4]. - Apple plans to pay approximately $1 billion annually for the use of Gemini technology, which is expected to significantly improve its AI functionalities while maintaining user data privacy [3][5]. - The collaboration is seen as a strategic move for Apple to gain time in the competitive landscape of large models, with Google benefiting from deeper integration into billions of Apple devices [4][5]. Group 2 - Counterpoint Research reported a 2% growth in global smartphone shipments in 2025, with Apple regaining the top position in market share at 20%, driven by strong sales of the iPhone 17 series [33][34]. - The report highlighted that the growth was primarily fueled by recovering demand in emerging markets and an improved economic environment [33]. Group 3 - The storage market has entered a "super bull market," with prices expected to rise by 50% this year due to increased demand from AI servers, significantly impacting the cost structure for smartphone and server manufacturers [85][86]. - Counterpoint's forecast indicates that storage prices surged by 40%-50% in Q4 of last year and are projected to continue rising in Q1 and Q2 of this year [86][88]. Group 4 - Bill Gates expressed optimism about the role of AI in driving key innovations over the next decade, particularly in climate, healthcare, and education, while also emphasizing the need for governance and regulation [94][95]. - Elon Musk suggested that advancements in AI, energy, and robotics will lead to a future where financial savings for retirement may become irrelevant, envisioning a world of abundant resources [97][98].
DeepSeek的资金后盾 梁文锋幻方量化2025收益率曝光
Feng Huang Wang· 2026-01-12 10:23
Group 1 - DeepSeek's founder Liang Wenfeng's quantitative hedge fund achieved over 50% return last year, enhancing DeepSeek's potential funding reserves [1] - According to data from Shenzhen Paipai Network Investment Management Co., the average return of funds under Huansheng Quantitative is 56.6% in 2025, managing over 70 billion RMB (approximately 10 billion USD) in assets [1] - Huansheng Quantitative ranks second among Chinese quantitative funds managing over 10 billion RMB, only behind Ningbo Lingjun Investment Management, which leads with over 70% return [1] Group 2 - Liang Wenfeng's strong performance at Huansheng Quantitative is expected to provide more funding support for DeepSeek, which was incubated by Huansheng Quantitative in 2023 [1] - The successful performance of the fund may generate over 700 million USD in revenue based on a 1% management fee and 20% performance fee, significantly exceeding DeepSeek's reported budget of less than 6 million USD for developing its AI model [2] - DeepSeek's research funding comes from Huansheng Quantitative's R&D budget, as previously stated by Liang Wenfeng [3]
2025 AI 年度复盘:读完200篇论文,看DeepMind、Meta、DeepSeek ,中美巨头都在描述哪种AGI叙事
3 6 Ke· 2026-01-12 08:44
Core Insights - The article discusses the evolution of artificial intelligence (AI) in 2025, highlighting a shift from merely increasing model parameters to enhancing model intelligence through foundational research in areas like fluid reasoning, long-term memory, spatial intelligence, and meta-learning [2][4]. Group 1: Technological Advancements - In 2025, significant technological progress was observed in fluid reasoning, long-term memory, spatial intelligence, and meta-learning, driven by the diminishing returns of scaling laws in AI models [2][3]. - The bottleneck in current AI technology lies in the need for models to not only possess knowledge but also to think and remember effectively, revealing a significant imbalance in AI capabilities [2][4]. - The introduction of Test-Time Compute revolutionized reasoning capabilities, allowing AI to engage in deeper, more thoughtful processing during inference [6][10]. Group 2: Memory and Learning Enhancements - The Titans architecture and Nested Learning emerged as breakthroughs in memory capabilities, enabling models to update their parameters in real-time during inference, thus overcoming the limitations of traditional transformer models [19][21]. - Memory can be categorized into three types: context as memory, RAG-processed context as memory, and internalized memory through parameter integration, with significant advancements in RAG and parameter adjustment methods [19][27]. - The introduction of sparse memory fine-tuning and on-policy distillation methods has mitigated the issue of catastrophic forgetting, allowing models to retain old knowledge while integrating new information [31][33]. Group 3: Spatial Intelligence and World Models - The development of spatial intelligence and world models was marked by advancements in video generation models, such as Genie 3, which demonstrated improved physical understanding and consistency in generated environments [35][36]. - The emergence of the World Labs initiative, led by Stanford professor Fei-Fei Li, focused on generating 3D environments based on multimodal inputs, showcasing a more structured approach to AI-generated content [44][46]. - The V-JEPA 2 model introduced by Meta emphasized predictive learning, allowing models to grasp physical rules through prediction rather than mere observation, enhancing their understanding of causal relationships [50][51]. Group 4: Reinforcement Learning Innovations - Reinforcement learning (RL) saw significant advancements with the rise of verifiable rewards and sparse reward metrics, leading to improved performance in areas like mathematics and coding [11][12]. - The GPRO algorithm gained popularity, simplifying the RL process by eliminating the need for a critic model, thus reducing computational costs while maintaining effectiveness [15][16]. - The exploration of RL's limitations revealed a ceiling effect, indicating that while RL can enhance existing model capabilities, further breakthroughs will require innovations in foundational models or algorithm architectures [17][18].
DeepSeek下一代AI 模型V4有望发布,低费率云计算ETF华夏(516630)涨超6%规模再创新高
Xin Lang Cai Jing· 2026-01-12 06:31
Group 1 - The AI sector is experiencing significant momentum, with the low-fee cloud computing ETF Huaxia (516630) rising by 6.47% as of 13:35, marking a three-day consecutive increase [1] - Key stocks within the ETF, including Tuolisi, Hand Information, and Yidian Tianxia, have hit the daily limit up, while other companies like Wanxing Technology and Zhongke Xingtou are also seeing gains [1] - Over the past week, the Huaxia cloud computing ETF has accumulated a rise of 9.33% as of January 9, 2026 [1] Group 2 - According to招商证券, investors are advised to focus on sectors with clear industrial trends such as AI, policy-driven sectors like Xinchuang, and financial technology that benefits from a bullish market [2] - The Huaxia cloud computing ETF (516630) tracks the cloud computing index (930851) and has the lowest fee rate, with a significant focus on domestic AI hardware and software computing power [2] - The index has a combined weight of 83.7% in computer software, cloud services, and computer equipment, with deep seek and AI applications each exceeding 40% [2]
DeepSeek V4大模型被曝春节前后发布!科创人工智能ETF华夏(589010) 放量大涨4.33%,持仓股掀起涨停潮
Mei Ri Jing Ji Xin Wen· 2026-01-12 06:00
Group 1 - The core viewpoint of the news highlights the strong performance of the Sci-Tech Innovation Artificial Intelligence ETF (589010), which surged by 4.33%, indicating robust market sentiment towards AI investments [1][2] - The ETF's trading volume exceeded 252 million yuan, with a turnover rate of over 8%, reflecting high trading enthusiasm and recognition of long-term value in the AI sector [1] - Key constituent stocks such as New Point Software and Hai Tian Rui Sheng experienced significant gains, with New Point Software hitting a 20% limit up and several others rising over 15%, showcasing excellent profit potential [1] Group 2 - Open Source Securities noted that AI innovation is continuously evolving, with model capabilities improving and costs decreasing, particularly with the rise of Chinese open-source models like DeepSeek and Qwen [2] - The development of multi-modal large models is experiencing rapid breakthroughs, which is expected to further enhance application growth in the AI industry [2] - The Sci-Tech Innovation Artificial Intelligence ETF closely tracks the Shanghai Stock Exchange Sci-Tech Innovation Board AI Index, covering high-quality enterprises across the entire industry chain, benefiting from high R&D investment and policy support [2]
ETF盘中资讯|DeepSeek V4有哪些突破?科创人工智能ETF华宝(589520)跳空大涨,暴拉5%!AI应用端涨势猛烈!
Jin Rong Jie· 2026-01-12 03:50
Core Viewpoint - The AI application sector continues to show strong momentum, with significant gains in the domestic AI industry chain, particularly highlighted by the performance of the Huabao AI ETF (589520) which surged by 4.55% [1][4]. Group 1: Market Performance - The Huabao AI ETF experienced a jump of 5% during intraday trading, marking its third consecutive day of gains [1]. - Key stocks in the AI sector saw substantial increases, with Zhongke Xingtou rising by 16.07%, Hehe Information by 15.32%, and others like Haitaian Ruisheng and Xinghuan Technology also showing strong performance [2][7]. Group 2: Industry Dynamics - The AI industry is witnessing a surge in activity, with significant financing rounds for overseas companies like xAI and Anthropic, and the introduction of domestic policies promoting "AI + manufacturing" [2][3]. - The upcoming release of DeepSeek's next-generation V4 model is anticipated to enhance programming capabilities and improve data pattern understanding, potentially reshaping the global AI competitive landscape [3]. Group 3: Future Outlook - Analysts predict that 2026 will be a "golden year" for AI applications, driven by technological maturity, supportive policies, and market demand [4]. - The domestic large model industry is transitioning from a phase of technological catch-up to systematic layout and ecosystem construction, with expectations of achieving leadership in certain areas by 2026 [4]. Group 4: Investment Opportunities - The Huabao AI ETF focuses on a balanced allocation across application software, terminal applications, terminal chips, and cloud chips, reflecting a shift from reliance on foreign technology to self-sufficiency [4][5]. - The ETF is positioned as an efficient tool for investing in domestic computing power, with a high concentration in semiconductor stocks [5].
DeepSeek V4有哪些突破?科创人工智能ETF华宝(589520)跳空大涨,暴拉5%!AI应用端涨势猛烈!
Xin Lang Cai Jing· 2026-01-12 03:14
Core Viewpoint - The AI application sector continues to show strong momentum, with the Huabao Science and Technology Artificial Intelligence ETF (589520) experiencing significant gains, indicating a robust interest in the domestic AI industry chain [1][9]. Group 1: Stock Performance - The leading stocks include Zhongke Xingtou, which surged over 16%, Hehe Information with a rise of over 15%, and Haitan Ruisheng close to 15% [3][11]. - Notable weight stocks such as Kingsoft Office increased by over 7%, while Cambrian Technologies rose by more than 2% [3][11]. Group 2: Market Dynamics - The current AI industry is witnessing intense activity, with overseas companies like xAI and Anthropic securing funding, and domestic policies promoting "AI + manufacturing" initiatives [4][12]. - The upcoming launch of DeepSeek's V4 model is anticipated to trigger a new wave of AI application enthusiasm, with significant improvements in programming capabilities and data pattern understanding [5][13]. Group 3: Future Outlook - Analysts predict that 2026 will be a "golden year" for AI applications, driven by technological maturity, supportive policies, and market demand [6][14]. - The domestic large model industry is transitioning from a technology catch-up phase to a systematic layout and ecosystem construction phase, with expectations of leading positions in certain areas by 2026 [6][14]. Group 4: ETF Characteristics - The Huabao Science and Technology Artificial Intelligence ETF (589520) focuses on the domestic AI industry chain, with a high concentration in semiconductor stocks, indicating a strong offensive strategy [7][15]. - The ETF is designed to efficiently capture domestic computing power and is a financing and margin trading target [7][15].
AI应用爆发!软件50ETF(159590)放量大涨超5%,早盘获实时净申购2000万元!GEO登上风口,DeepSeek V4发布期或近,豆包将登陆春晚
Sou Hu Cai Jing· 2026-01-12 02:19
Group 1 - The software sector experienced a significant surge, with the Software 50 ETF (159590) rising over 5% and a substantial trading volume, indicating strong market interest [1] - Major stocks within the Software 50 ETF saw positive performance, including Zhongke Xingtou up over 14%, Tuorisi up nearly 12%, and Keda Xunfei up over 6% [1] - The net subscription amount for the Software 50 ETF exceeded 22 million yuan shortly after the market opened [1] Group 2 - The Software 50 ETF tracks the CSI Software Index, which includes 50 constituent stocks, with approximately 67% weight in application software and over 15% in AI-related fields [6] - The index aims to provide comprehensive exposure to the entire AI software industry chain, making it a strategic investment option [6] Group 3 - The upcoming release of DeepSeek V4 is anticipated to enhance programming capabilities and understanding of data patterns, potentially reshaping the global AI competition landscape [3] - Fire Mountain Engine has become the exclusive AI cloud partner for the Spring Festival Gala, highlighting the growing importance of multi-modal capabilities in AI applications [3] Group 4 - Companies like Zhiyu and MiniMax have recently listed on the Hong Kong stock market, which is expected to accelerate their growth with capital support [4] - The shift from traditional SEO to GEO (Generative Engine Optimization) is seen as a long-term growth opportunity in AI marketing, as it enhances content visibility and brand authority [5]
1月12日早餐 | DeepSeek或发布新模型;我国新增20万颗卫星申请
Xuan Gu Bao· 2026-01-12 00:14
Group 1: Market Overview - The S&P 500 index rose by 0.65%, reaching a new high, while the Nasdaq 100 increased by 1% [1] - Non-farm payroll data showed mixed results, reinforcing market expectations for the Federal Reserve to maintain interest rates in January [3] - The dollar has appreciated for four consecutive days, reaching a one-month high [4] Group 2: Technology and AI Developments - Walmart and Alphabet have partnered to launch AI-supported shopping features on Gemini, marking a significant step in the integration of AI with e-commerce [24] - DeepSeek plans to release its next-generation AI model, DeepSeek V4, in February, which is expected to outperform existing models in programming capabilities [26] - SanDisk has proposed a "100% cash prepayment" model for SSD supply agreements, indicating a severe supply-demand imbalance in the storage market [25] Group 3: Industry-Specific News - The Chinese government has initiated a three-year action plan to support the transformation and upgrading of advanced manufacturing, focusing on AI and innovative products [11] - The domestic civil drone market is projected to grow significantly, with an estimated market size of approximately 146.8 billion yuan by 2024 [27] - The China Securities Regulatory Commission has increased the whistleblower reward for securities and futures violations to a maximum of 1 million yuan [10] Group 4: Company Announcements - Defu Technology plans to acquire at least 51% of Huiru Technology, which specializes in high-performance electrolytic copper foil [29] - YTO Express intends to acquire 100% of Wanjia Gaoke for 305 million yuan [29] - AirNexis will pay 108 million USD upfront and up to 955 million USD in milestone payments for a licensing agreement with Haise Science [29]