Model1 模型
Search documents
R1一周年,DeepSeek Model 1悄然现身
机器之心· 2026-01-21 00:32
Core Insights - DeepSeek officially launched the DeepSeek-R1 model on January 20, 2025, marking the beginning of a new era for open-source LLMs, with DeepSeek-R1 being the most praised model on the Hugging Face platform [2] - A new model named Model1 has emerged in DeepSeek's FlashMLA code repository, attracting significant attention from the online community [5] - Analysis suggests that Model1 is likely the internal development code name or the first engineering version of DeepSeek's next flagship model, DeepSeek-V4 [9] Technical Details - The core architecture of Model1 has reverted to a 512-dimensional standard, indicating a potential optimization for alignment with NVIDIA's next-generation Blackwell (SM100) architecture [9] - Model1 introduces a "Token-level Sparse MLA" as a significant evolution in operators compared to the V3 series, along with new mechanisms such as Value Vector Position Awareness (VVPA) and Engram [11][12] - Performance benchmarks show that the currently unoptimized Sparse MLA operator can achieve 350 TFlops on the B200, while the Dense MLA can reach 660 TFlops on the H800 (SM90a) [10] Architectural Changes - The transition from the previous V32 model, which utilized a non-symmetric MLA design, to a standardized 512-dimensional configuration in Model1 suggests a strategic shift in DeepSeek's architectural approach [9] - The codebase includes optimizations specifically for the Blackwell GPU architecture, indicating a focus on enhancing computational efficiency [9] - The introduction of FP8 KV Cache mixed precision in Sparse operators aims to reduce memory pressure and improve speed in long-context scenarios [12]