Seek .-DeepSeek新模型真的要来了？“MODEL1”曝光

Core Insights - The article discusses the emergence of a new model named "MODEL1" from DeepSeek, coinciding with the one-year anniversary of the release of DeepSeek-R1, indicating potential advancements in AI technology [1][4]. Group 1: Model Development - "MODEL1" has been referenced in the updated FlashMLA code on GitHub, suggesting it is a new model distinct from the existing "V32" architecture [1][2]. - There are differing opinions in the industry regarding whether "MODEL1" represents a V4 model or an advanced version of the V3 series [2][3]. - The new model is expected to be close to completion, awaiting final weight freezing and testing validation, indicating a near launch [3]. Group 2: Technical Innovations - FlashMLA is a proprietary software tool optimized for NVIDIA Hopper architecture GPUs, crucial for achieving low-cost and high-performance model implementations [3]. - Key technical differences between "MODEL1" and "V32" include variations in key-value (KV) cache layout, sparse processing methods, and support for FP8 data format decoding, suggesting targeted design for memory optimization and computational efficiency [3]. Group 3: Market Impact and Expectations - The anticipation for DeepSeek's next flagship model is high, with expectations that it will integrate recent research findings, including a new training method and an AI memory module [4]. - The release of DeepSeek-R1 has significantly influenced the open-source community, with increased contributions from major Chinese companies and a shift in global reliance towards Chinese-developed open-source models [5][7].