Flash MLA
Search documents
DeepSeek新模型曝光?
新华网财经· 2026-01-22 05:00
Core Insights - DeepSeek has released a new model "MODEL1" in the open-source community, coinciding with the one-year anniversary of the DeepSeek-R1 model launch [1] - The company plans to gradually unveil five code repositories during the "Open Source Week" starting in February 2025, with Flash MLA being the first project [3] - Industry analysts suggest that "MODEL1" may represent a new architecture distinct from the existing "V32" model, potentially indicating the next-generation model (R2 or V4) that has not yet been publicly released [4] Group 1 - Flash MLA optimizes memory access and computation processes on Hopper GPUs, significantly enhancing the efficiency of variable-length sequence processing [3] - The core design of Flash MLA includes a dynamic memory allocation mechanism and parallel decoding strategy, which reduces redundant computations and increases throughput, particularly for large language model inference tasks [3] - DeepSeek has been active since January 2026, releasing two technical papers on a new training method called "optimized residual connections (mHC)" and a biologically inspired "AI memory module (Engram)" [4] Group 2 - On January 12, DeepSeek published a new paper in collaboration with Peking University, introducing a conditional memory mechanism to address the inefficiencies of the Transformer architecture in knowledge retrieval [5] - The Engram module proposed by DeepSeek is said to enhance knowledge retrieval and improve performance in reasoning and code/mathematics tasks [5] - The private equity firm managed by Liang Wenfeng, known for high returns, has provided substantial support for DeepSeek's research and development efforts [5]
DeepSeek新模型曝光?“MODEL1”现身开源社区
Shang Hai Zheng Quan Bao· 2026-01-21 21:31
Core Insights - DeepSeek has updated its FlashMLA code on GitHub, revealing the previously undisclosed "MODEL1" identifier, which may indicate a new model distinct from the existing "V32" [3][4] - The company plans to launch an "open source week" in February 2025, gradually releasing five codebases, with Flash MLA being the first project [4] - Flash MLA optimizes memory access and computation processes on Hopper GPUs, significantly enhancing the efficiency of variable-length sequence processing, particularly for large language model inference tasks [4] Company Developments - DeepSeek's upcoming AI model, DeepSeek V4, is expected to be released around the Lunar New Year in February 2025, although the timeline may vary [4] - The V4 model is an iteration of the V3 model released in December 2024, boasting advanced programming capabilities that surpass current leading models like Anthropic's Claude and OpenAI's GPT series [5] - Since January 2026, DeepSeek has published two technical papers introducing a new training method called "optimized residual connections (mHC)" and a biologically inspired "AI memory module (Engram)" [5] Industry Context - The introduction of the Engram module aims to improve knowledge retrieval and general reasoning, addressing inefficiencies in the Transformer architecture [5] - The support from Liang Wenfeng's private equity firm, which has achieved a 56.55% average return in 2025, has bolstered DeepSeek's research and development efforts [5]
DeepSeek开源周活动收官,将加快大模型在各行业的应用落地
Ping An Securities· 2025-03-03 09:15
Investment Rating - The industry investment rating is "stronger than the market" (预计6个月内,行业指数表现强于市场表现5%以上) [32] Core Views - The DeepSeek open-source week has concluded, accelerating the application of large models across various industries [2][3] - The competition among global large models remains intense, providing strong support for the continuous growth of AI computing power [8][11] - NVIDIA's FY25Q4 performance is strong, with robust demand on the inference side of the Blackwell architecture [13][15] Summary by Sections Industry News and Commentary - The DeepSeek open-source week launched five open-source software library projects covering computation, communication, and storage, which will facilitate the replication of DeepSeek-V3/R1 by global developers [2][5] - The release of models like Grok-3, Claude 3.7 Sonnet, and GPT-4.5 indicates ongoing fierce competition in the global large model market, which is expected to elevate the capabilities of these models [9][11] - NVIDIA reported FY25Q4 revenue of $39.3 billion, a 12% quarter-over-quarter increase and a 78% year-over-year increase, driven primarily by data center business growth [13][14] Investment Recommendations - The report suggests a positive outlook for the computer industry, anticipating dual improvements in performance and valuation due to accelerating demand recovery [28] - Recommended stocks include: - Innovation and Creation Sector: Haiguang Information, Longxin Zhongke, Zhongke Shuguang, Kingsoft Office, Dameng Data, Foxit Software, Taiji Co., Ltd. - Huawei Supply Chain: Digital China, with a focus on Tuo Wei Information, Kirin Information Security, Runhe Software, and others - AI Sector: Strong recommendations for Zhongke Chuangda, Shengshi Technology, and Qiming Star, among others - Low-altitude Economy: Recommended stocks include Da Tong Technology and others - Financial IT Sector: Strong recommendation for Hengsheng Electronics, with additional suggestions for Tonghuashun and others [28]
全面适配!京东云将DeepSeek推理场景性能提升50%
Zhong Guo Jing Ji Wang· 2025-03-03 09:10
Core Insights - DeepSeek's five core technologies (FlashMLA, DeepEP, DeepGEMM, DualPipe & EPLB, 3FS file system) were showcased during a five-day "Open Source Week," achieving significant global attention [1] - JD Cloud announced full-stack adaptation of these technologies, resulting in a 50% performance improvement in inference scenarios [1][2] Group 1: Technology Enhancements - Flash MLA optimizes GPU memory and computational resources, addressing resource wastage in traditional methods for processing variable-length sequences [1] - The vGPU AI computing platform supports Flash MLA's FP8 format, reducing single Token's KV Cache memory usage by 57 times compared to Multi-head Attention, ensuring high throughput and low latency under high concurrency [1] Group 2: Communication and Performance - JD Cloud's vGPU AI computing platform fully supports distributed inference using the DeepEP communication library, significantly enhancing inference throughput [2] - By integrating DeepEP, JD Cloud utilizes NVLink for intra-machine communication and NVSHMEM for inter-machine communication, improving GPU resource utilization and reducing performance bottlenecks [2] Group 3: Local Deployment and Adaptation - JD Cloud has assisted multiple local governments in deploying DeepSeek based on existing infrastructure, allowing local enterprises to access the service without resource investment [3] - The platform has achieved comprehensive domestic chip adaptation, ensuring self-control from foundational computing to large model applications, including over ten domestic AI computing solutions [2]