Core Insights - DeepSeek has released a new model "MODEL1" in the open-source community, coinciding with the one-year anniversary of the DeepSeek-R1 model launch [1] - The company plans to gradually unveil five code repositories during the "Open Source Week" starting in February 2025, with Flash MLA being the first project [3] - Industry analysts suggest that "MODEL1" may represent a new architecture distinct from the existing "V32" model, potentially indicating the next-generation model (R2 or V4) that has not yet been publicly released [4] Group 1 - Flash MLA optimizes memory access and computation processes on Hopper GPUs, significantly enhancing the efficiency of variable-length sequence processing [3] - The core design of Flash MLA includes a dynamic memory allocation mechanism and parallel decoding strategy, which reduces redundant computations and increases throughput, particularly for large language model inference tasks [3] - DeepSeek has been active since January 2026, releasing two technical papers on a new training method called "optimized residual connections (mHC)" and a biologically inspired "AI memory module (Engram)" [4] Group 2 - On January 12, DeepSeek published a new paper in collaboration with Peking University, introducing a conditional memory mechanism to address the inefficiencies of the Transformer architecture in knowledge retrieval [5] - The Engram module proposed by DeepSeek is said to enhance knowledge retrieval and improve performance in reasoning and code/mathematics tasks [5] - The private equity firm managed by Liang Wenfeng, known for high returns, has provided substantial support for DeepSeek's research and development efforts [5]
DeepSeek新模型曝光?
新华网财经·2026-01-22 05:00