Workflow
DeepSeek
icon
Search documents
DeepSeek新模型上线!引入DSA新稀疏注意力,还又狙了CUDA一枪
量子位· 2025-09-29 10:44
Core Insights - DeepSeek has launched its latest model, DeepSeek-V3.2-Exp, which introduces a new attention mechanism called DeepSeek Sparse Attention (DSA) [1][6] - The model aims to enhance long text processing and inference efficiency without significantly affecting output quality [7] - A significant price reduction for the official API has been announced, starting at 50% off [3][17] Model Updates - DeepSeek-V3.2-Exp is built on the previous version, DeepSeek-V3.1-Terminus, which focused on stability, tool invocation capabilities, language consistency, and error correction [9] - In benchmark comparisons, DeepSeek-V3.2-Exp shows comparable performance to DeepSeek-V3.1-Terminus across various evaluation sets [10] - The model demonstrates improved inference costs when handling 128K long contexts, particularly during the decoding phase [12] Technical Innovations - The introduction of DSA allows for fine-grained attention mechanisms, leading to significant improvements in processing efficiency [6][7] - DeepSeek has open-sourced GPU operators in both TileLang and CUDA versions, facilitating research and development [13][15] - The company recommends using the TileLang version for debugging and rapid iteration during experimental research [16] Community Engagement - The announcement includes a call to action for the community to engage with the new model and take advantage of the promotional pricing [18] - Links to access the model on platforms like HuggingFace and ModelScope have been provided [19]
刚刚,DeepSeek开源V3.2-Exp,公开新稀疏注意力机制DSA
机器之心· 2025-09-29 10:29
Core Viewpoint - DeepSeek has released the experimental version DeepSeek-V3.2-Exp, which introduces a new sparse attention mechanism aimed at optimizing training and inference efficiency in long-context scenarios [3][5][10]. Summary by Sections Model Release - DeepSeek-V3.2-Exp has been open-sourced with a parameter count of 685 billion [3]. - The release includes a paper detailing the new sparse attention mechanism [5]. Sparse Attention Mechanism - The DeepSeek Sparse Attention (DSA) is the only architectural improvement in version 3.2, focusing on enhancing computational efficiency when processing extended text sequences [5][6][10]. - DSA achieves fine-grained sparse attention while maintaining nearly the same output quality as its predecessor, DeepSeek-V3.1-Terminus [9]. Performance Comparison - A comparison of benchmark results between DeepSeek-V3.1-Terminus and DeepSeek-V3.2-Exp shows that the new version performs comparably across various tasks [11]. - Specific benchmark results include: - MMLU-Pro: 85.0 (V3.1) vs. 85.0 (V3.2) - AIME 2025: 88.4 (V3.1) vs. 89.3 (V3.2) - Codeforces: 2046 (V3.1) vs. 2121 (V3.2) [11]. Future Developments - The upcoming release of Z.ai's GLM-4.6 model is noted, with GLM-4.5 being the previous flagship model [12].
DeepSeek-V3.2-Exp正式发布 API大幅降价
Core Insights - DeepSeek has officially launched the DeepSeek-V3.2-Exp model, with updates available on its official app, web platform, and mini-programs [1] - The new pricing policy significantly reduces the cost for developers using DeepSeek API by over 50% [1]
DeepSeek-V3.2-Exp正式发布,API价格大幅下调
财联社· 2025-09-29 10:27
Core Insights - DeepSeek-V3.2-Exp model has been officially released and open-sourced on the Hugging Face platform, introducing a sparse Attention architecture that reduces computational resource consumption and enhances inference efficiency [1] - Huawei Cloud has completed the adaptation of the DeepSeek-V3.2-Exp model, supporting a maximum context length of 160K [2] - The official API pricing for DeepSeek has been significantly reduced, with costs for developers to access the API dropping by over 50% due to the new model's lower service costs [3]
DeepSeek-V3.2-Exp 发布,训练推理提效,API成本降50%以上
Xin Lang Ke Ji· 2025-09-29 10:27
Core Insights - DeepSeek has released the DeepSeek-V3.2-Exp model, which is an experimental version aimed at transitioning to a new generation architecture [1] - The new model introduces DeepSeek Sparse Attention, focusing on optimizing training and inference efficiency for long texts [1] - The official app, web version, and mini-program have all been updated to DeepSeek-V3.2-Exp, and the API costs have been significantly reduced by over 50% [1] - The performance of DeepSeek-V3.2-Exp on public evaluation sets is comparable to that of V3.1-Terminus [1]
DeepSeek V3.2要来了?
Guan Cha Zhe Wang· 2025-09-29 09:58
Core Insights - The appearance of DeepSeek-V3.2 on the Hugging Face platform has sparked speculation among users [1] - DeepSeek has a history of releasing new versions and updates around significant holidays [2] - The most recent update prior to the speculation was DeepSeek-V3.1-Terminus, released on September 22, with an open-source announcement [3] Version Release History - DeepSeek V3 was released on December 27, 2024, just before New Year's [3] - DeepSeek-R1-0528 was launched on May 28, 2025, as a special gift for the Dragon Boat Festival [3] - The latest version, DeepSeek-V3.1-Terminus, was made available on September 22, 2023, along with an open-source model [3] Current Status - The Hugging Face interface related to DeepSeek is currently showing errors, and there has been no official response from DeepSeek regarding the situation [4]
DeepSeek和智谱都将于近日发布新模型,或将迎来重大突破
IPO早知道· 2025-09-29 09:45
Core Viewpoint - The article highlights significant advancements in two leading Chinese large model companies, DeepSeek and Zhiyu, with new model releases expected to enhance their capabilities and market position [2][3]. Group 1: DeepSeek Developments - DeepSeek announced the upload of its new model, DeepSeek-V3.2, to the HuggingFace community platform on September 29 [2]. - The previous version, DeepSeek-V3.1, released in August, introduced a mixed inference architecture that supports both thinking and non-thinking modes, improved thinking efficiency, and enhanced agent capabilities through post-training optimization [3]. Group 2: Zhiyu Developments - Zhiyu is set to release its new model, GLM-4.6, with some users already able to access it via API [2]. - The flagship model GLM-4.5, launched in July, integrates reasoning, coding, and agent capabilities into a single model to meet complex application needs [3]. - In August, Zhiyu also released the GLM-4.5V, a high-performance open-source visual reasoning model with a total of 106 billion parameters and 12 billion active parameters [3].
DeepSeek与智谱将发布新模型
第一财经· 2025-09-29 09:05
Core Insights - The new AI model DeepSeek-V3.2 was uploaded to the community platform HuggingFace but was subsequently removed [1] - The new model GLM-4.6 from Zhipu is expected to be released soon and is currently accessible via API [1]
国庆前发布?DeepSeek V3.2惊现HuggingFace
Hua Er Jie Jian Wen· 2025-09-29 09:03
Core Insights - DeepSeek has uploaded the v3.2-base model to its official HuggingFace page, although the model file is currently offline [1] Summary by Categories - **Company Developments** - DeepSeek has made progress by uploading the v3.2-base model to its official HuggingFace page [1]
DeepSeek与智谱将发布新模型
Di Yi Cai Jing· 2025-09-29 08:58
Core Insights - The AI community has discovered that the new model DeepSeek-V3.2 was uploaded to the HuggingFace platform but was subsequently removed [1] - Additionally, the new model GLM-4.6 from Zhipu is expected to be released soon and is currently accessible via API [1]