Seek . - filings, earnings calls, financial reports, news

Seek .(SKLTY)

Search documents

Shang Hai Zheng Quan Bao· 2026-01-21 21:31

Core Insights - DeepSeek has updated its FlashMLA code on GitHub, revealing the previously undisclosed "MODEL1" identifier, which may indicate a new model distinct from the existing "V32" [3][4] - The company plans to launch an "open source week" in February 2025, gradually releasing five codebases, with Flash MLA being the first project [4] - Flash MLA optimizes memory access and computation processes on Hopper GPUs, significantly enhancing the efficiency of variable-length sequence processing, particularly for large language model inference tasks [4] Company Developments - DeepSeek's upcoming AI model, DeepSeek V4, is expected to be released around the Lunar New Year in February 2025, although the timeline may vary [4] - The V4 model is an iteration of the V3 model released in December 2024, boasting advanced programming capabilities that surpass current leading models like Anthropic's Claude and OpenAI's GPT series [5] - Since January 2026, DeepSeek has published two technical papers introducing a new training method called "optimized residual connections (mHC)" and a biologically inspired "AI memory module (Engram)" [5] Industry Context - The introduction of the Engram module aims to improve knowledge retrieval and general reasoning, addressing inefficiencies in the Transformer architecture [5] - The support from Liang Wenfeng's private equity firm, which has achieved a 56.55% average return in 2025, has bolstered DeepSeek's research and development efforts [5]

Seek .(US:SKLTY)

Engram模块

条件记忆（conditional memory）

条件记忆（conditional memory）

DeepSeek新模型“MODEL1”曝光

Di Yi Cai Jing Zi Xun· 2026-01-21 09:05

Core Insights - The article discusses the emergence of a new model named "MODEL1" from DeepSeek, coinciding with the one-year anniversary of the DeepSeek-R1 release, indicating potential advancements in AI model architecture [2][6]. Group 1: Model Development - "MODEL1" has been referenced in the updated FlashMLA code on GitHub, suggesting it may represent a new model distinct from the existing "V32" architecture [2][3]. - There are differing opinions in the industry regarding whether "MODEL1" is a version 4 model or an advanced inference model, with some developers speculating it could be the ultimate version of the V3 series [2][5]. - Key technical differences between "MODEL1" and "V32" include variations in key-value (KV) cache layout, sparsity handling, and support for FP8 data format decoding, indicating targeted design for memory optimization and computational efficiency [5]. Group 2: Anticipated Release and Features - The structure of the model files suggests that "MODEL1" is nearing completion or inference deployment, awaiting final weight freezing and testing validation, which implies a forthcoming launch [5]. - There are expectations for DeepSeek to release its next flagship model, DeepSeek V4, in February, with preliminary tests indicating it may surpass other top models in programming capabilities [6]. - Recent technical papers from DeepSeek introduce new training methods and an AI memory module, hinting that these innovations may be integrated into the upcoming model [6]. Group 3: Industry Impact - The DeepSeek-R1 model has been recognized as the most praised model on Hugging Face, significantly lowering barriers in inference technology and production deployment, thus influencing the open-source strategy of major Chinese companies [9]. - Over the past year, Chinese AI models have seen increased downloads on Hugging Face, surpassing those from the U.S., indicating a shift in reliance on Chinese-developed open-source models within the global supply chain [9].