Workflow
DeepSeek MODEL1
icon
Search documents
DeepSeek新模型真的要来了?“MODEL1”曝光
Di Yi Cai Jing Zi Xun· 2026-01-21 07:00
Core Insights - The article discusses the emergence of a new model named "MODEL1" from DeepSeek, coinciding with the one-year anniversary of the release of DeepSeek-R1, indicating potential advancements in AI technology [1][4]. Group 1: Model Development - "MODEL1" has been referenced in the updated FlashMLA code on GitHub, suggesting it is a new model distinct from the existing "V32" architecture [1][2]. - There are differing opinions in the industry regarding whether "MODEL1" represents a V4 model or an advanced version of the V3 series [2][3]. - The new model is expected to be close to completion, awaiting final weight freezing and testing validation, indicating a near launch [3]. Group 2: Technical Innovations - FlashMLA is a proprietary software tool optimized for NVIDIA Hopper architecture GPUs, crucial for achieving low-cost and high-performance model implementations [3]. - Key technical differences between "MODEL1" and "V32" include variations in key-value (KV) cache layout, sparse processing methods, and support for FP8 data format decoding, suggesting targeted design for memory optimization and computational efficiency [3]. Group 3: Market Impact and Expectations - The anticipation for DeepSeek's next flagship model is high, with expectations that it will integrate recent research findings, including a new training method and an AI memory module [4]. - The release of DeepSeek-R1 has significantly influenced the open-source community, with increased contributions from major Chinese companies and a shift in global reliance towards Chinese-developed open-source models [5][7].
DeepSeek新模型曝光!AI主线强势回归!澜起科技领涨超10%,科创人工智能ETF汇添富(589560)涨超3%,AI应用成开年主线,后续催化怎么看?
Sou Hu Cai Jing· 2026-01-21 06:54
Group 1: AI Model Developments - DeepSeek's new model "MODEL1" is reported to be an efficient inference model with lower memory usage, suitable for edge devices and cost-sensitive scenarios, and optimized for long sequence tasks [1] - DeepSeek is expected to launch its next-generation flagship AI model, DeepSeek V4, during the Lunar New Year in mid-February, which is anticipated to significantly enhance code capabilities and potentially surpass leading products like GPT and Claude [1][2] Group 2: AI Industry Growth in China - China's intelligent computing power has reached 1590 EFLOPS, with a rapid emergence of high-quality industry data sets, positioning domestic large models to lead the global open-source ecosystem [3] - By 2025, the number of AI companies in China is projected to exceed 6000, with the core industry scale expected to surpass 1.2 trillion RMB [3] Group 3: AI Sector Investment Insights - The AI industry can be divided into three main segments: foundational layer (hardware computing power), technical layer (large models and algorithm frameworks), and application layer (vertical solutions across industries) [4] - The technical layer is expected to see significant demand and policy support, particularly in semiconductor fields, driven by capital expenditure from the foundational layer and domestic substitution strategies [5] Group 4: Market Performance and Trends - The AI application sector has become a primary focus for investment at the start of 2026, with a year-to-date increase of 19%, leading the A-share market [7] - The CES 2026 event provided insights into the future direction of AI applications, with hardware increasingly penetrating daily life through various forms, including smart glasses and wearable devices [7]
DeepSeek新模型曝光
财联社· 2026-01-21 06:34
Core Viewpoint - DeepSeek is advancing its AI model capabilities with the introduction of MODEL1, which is designed for efficient inference and optimized for various GPU architectures, indicating a strategic focus on enhancing performance and reducing memory usage in AI applications [4][5][6]. Group 1: MODEL1 and FlashMLA - MODEL1 is a newly revealed model architecture within DeepSeek's FlashMLA, which is a software tool optimized for NVIDIA Hopper architecture GPUs, aimed at accelerating large model inference generation [4]. - FlashMLA utilizes a multi-layer attention mechanism (MLA) to minimize memory usage and maximize GPU hardware efficiency, which is crucial for the performance of DeepSeek's models [4][5]. - MODEL1 is expected to be a low-memory consumption model suitable for edge devices and cost-sensitive scenarios, with optimizations for long sequence tasks such as document understanding and code analysis [5]. Group 2: DeepSeek's Model Development - DeepSeek's existing models represent two technical routes: the V series focusing on comprehensive performance and the R series targeting complex reasoning tasks [6]. - The V3 model, launched in December 2024, marked a significant milestone with its efficient MoE architecture, followed by rapid iterations leading to V3.1 and V3.2, which enhance reasoning and agent capabilities [6]. - The R1 model, released in January 2025, excels in solving complex reasoning tasks through reinforcement learning and introduces a "deep thinking" mode, showcasing DeepSeek's commitment to advancing AI capabilities [7]. Group 3: Future Developments - DeepSeek is expected to launch its next flagship AI model, DeepSeek V4, around mid-February 2025, which is anticipated to have enhanced coding capabilities [7]. - Recent technical papers from DeepSeek discuss new training methods and an AI memory module inspired by biology, suggesting that these innovations may be integrated into upcoming models [7].
炸锅了!DeepSeek MODEL1 引发全网大猜测,R2 or V4?
程序员的那些事· 2026-01-21 04:21
Core Viewpoint - The sudden emergence of a new model named "MODEL1" from DeepSeek has generated significant excitement in the domestic large model sector, indicating potential advancements in AI technology and competition in the market [1][3]. Group 1: MODEL1 Features - MODEL1 has been revealed to include advanced technologies such as optimized KV cache layout and support for FP8 sparse decoding, which are expected to greatly enhance inference efficiency and reduce memory usage [2]. - The model integrates a long-context optimization mechanism, addressing the common issue of large models struggling to retain long texts [2]. Group 2: Speculations and Expectations - There is speculation among the community that MODEL1 could either be the long-awaited R2 model, which has faced delays due to chip shortages, or the upcoming V4 model, following the naming convention after V3.2 [3]. - DeepSeek has not officially responded to these speculations, but there are indications that the new model may be released around the Chinese New Year [3].