Workflow
DeepSeek Sparse Attention (DSA)稀疏注意力机制
icon
Search documents
DeepSeek大模型V3.2亮相!华为、寒武纪芯片同步适配开源,首次自研DSA注意力机制,API价格砍半
Hua Er Jie Jian Wen· 2025-09-29 13:53
Core Insights - DeepSeek has officially released and open-sourced the DeepSeek-V3.2-Exp model on the Hugging Face platform, marking a significant step towards the next generation architecture [1] - The new model introduces the DeepSeek Sparse Attention (DSA) mechanism, which aims to optimize training and inference efficiency for long texts while reducing computational resource consumption [1] - The model supports a maximum context length of 160K, with successful adaptations completed by Huawei and Cambricon [1] Technical Breakthroughs - The DeepSeek Sparse Attention (DSA) mechanism achieves fine-grained sparse attention, significantly enhancing training and inference efficiency for long text scenarios without compromising output quality [1][3] - The training settings for DeepSeek-V3.2-Exp were strictly aligned with the previous version, V3.1-Terminus, showing comparable performance across major public evaluation datasets [3] Benchmark Performance - Performance comparison between DeepSeek-V3.1-Terminus and DeepSeek-V3.2-Exp across various benchmarks shows: - MMLU-Pro: 85.0 (both versions) - GPQA-Diamond: 80.7 (V3.1) vs 79.9 (V3.2) - Humanity's Last Exam: 21.7 (V3.1) vs 19.8 (V3.2) - BrowseComp: 38.5 (V3.1) vs 40.1 (V3.2) - SimpleQA: 96.8 (V3.1) vs 97.1 (V3.2) - Codeforces-Div1: 2046 (V3.1) vs 2121 (V3.2) - AIME 2025: 88.4 (V3.1) vs 89.3 (V3.2) [4] Cost Reduction - The introduction of the new model has led to a significant reduction in API service costs, with a price drop of over 50%, effective immediately [4] Open Source and Community Support - DeepSeek has fully open-sourced the DeepSeek-V3.2-Exp model on Hugging Face and ModelScope, along with related research papers [6] - The company has retained API access for the V3.1-Terminus version for comparison purposes until October 15, 2025, with pricing aligned to V3.2-Exp [6] - To support community research, DeepSeek has also open-sourced GPU operators designed for the new model, recommending the use of the TileLang version for ease of debugging and rapid iteration [6] Industry Collaboration - Cambricon has announced the completion of adaptation for the new model and has open-sourced the vLLM-MLU inference engine source code, allowing developers to experience the new model's features on their hardware platform [6][7]