Sparse Attention Mechanism
Search documents
DeepSeek-V3.2上线国家超算互联网 开发者可免费下载
Sou Hu Cai Jing· 2025-09-30 11:58
Core Insights - DeepSeek has launched the experimental version DeepSeek-V3.2-Exp, which introduces the DeepSeekSparseAttention mechanism to enhance training and inference efficiency for long texts [1] - The AI community now hosts over 700 high-quality open-source models, providing developers with various services including API calls and distributed training [2] Group 1 - DeepSeek-V3.2-Exp is available for free download in the national supercomputing internet AI community, allowing enterprises and developers to quickly develop applications [1] - The new model is a step towards a next-generation architecture, building on the previous version V3.1-Terminus [1] - DeepSeekSparseAttention achieves significant improvements in long text training and inference efficiency with minimal impact on model output [1] Group 2 - The supercomputing internet AI community features a collection of over 700 models, including various versions of the DeepSeek series [2] - Developers can utilize the community for a range of services, including online inference dialogue and model fine-tuning [2] - The community supports a comprehensive MaaS (Model as a Service) offering for developers [2]
DeepSeek-V3.2-Exp来了,API价格再度大幅下调
Feng Huang Wang· 2025-09-29 14:03
Core Insights - The new pricing policy will reduce the cost for developers using the DeepSeek API by over 50% [2][3] - The release of the DeepSeek-V3.2-Exp model on September 29, 2025, introduces the DeepSeek Sparse Attention mechanism, enhancing training and inference efficiency for long texts [2] - The V3.2-Exp model maintains performance levels comparable to the previous V3.1-Terminus model across various benchmarks [2][3] Performance Comparison - In the MMLU-Pro benchmark, DeepSeek-V3.1-Terminus scored 85.0, while V3.2-Exp maintained the same score [3] - For the BrowseComp search benchmark, V3.2-Exp improved to 40.1 from 38.5 in V3.1-Terminus [3] - The Codeforces-Div1 benchmark saw an increase from 2046 in V3.1-Terminus to 2121 in V3.2-Exp [3] Accessibility and Development - The V3.2-Exp model has been made open-source on Huggingface and Modao platforms, allowing users to access and develop further [5] - The updated version is available on the official app, web, and mini-programs [2][3]
国庆前放大招!DeepSeek-V3.2-Exp发布并开源,API成本将降低50%以上
华尔街见闻· 2025-09-29 11:12
Core Insights - DeepSeek has launched the DeepSeek-V3.2-Exp model on Hugging Face, introducing the DeepSeek Sparse Attention (DSA) mechanism to enhance training and inference efficiency for long texts [1][3] - Huawei Cloud has adapted the DeepSeek-V3.2-Exp model, supporting a maximum context length of 160K [2] - The DSA technology significantly improves training and inference efficiency for long text scenarios with minimal impact on model output [3] - The training settings of DeepSeek-V3.2-Exp were strictly aligned with the previous version, V3.1-Terminus, showing comparable performance across various benchmarks [5] - The new model has led to a reduction of over 50% in API costs, with immediate price adjustments implemented [8] - DeepSeek has made the DeepSeek-V3.2-Exp model fully open-source on Hugging Face and ModelScope, with related research papers also published [9] - The company has retained API access for the V3.1-Terminus version for comparison purposes until October 15, 2025 [9] - Additionally, DeepSeek has open-sourced GPU operators designed for the new model, recommending the use of the TileLang version for research experiments [10]