Sparse Attention Mechanism - filings, earnings calls, financial reports, news

DeepSeek-R1-0528

DeepSeek-R1

DeepSeek-V3.2-Exp来了，API价格再度大幅下调

DeepSeek-R1-0528

DeepSeek-R1

Feng Huang Wang· 2025-09-29 14:03

Core Insights - The new pricing policy will reduce the cost for developers using the DeepSeek API by over 50% [2][3] - The release of the DeepSeek-V3.2-Exp model on September 29, 2025, introduces the DeepSeek Sparse Attention mechanism, enhancing training and inference efficiency for long texts [2] - The V3.2-Exp model maintains performance levels comparable to the previous V3.1-Terminus model across various benchmarks [2][3] Performance Comparison - In the MMLU-Pro benchmark, DeepSeek-V3.1-Terminus scored 85.0, while V3.2-Exp maintained the same score [3] - For the BrowseComp search benchmark, V3.2-Exp improved to 40.1 from 38.5 in V3.1-Terminus [3] - The Codeforces-Div1 benchmark saw an increase from 2046 in V3.1-Terminus to 2121 in V3.2-Exp [3] Accessibility and Development - The V3.2-Exp model has been made open-source on Huggingface and Modao platforms, allowing users to access and develop further [5] - The updated version is available on the official app, web, and mini-programs [2][3]

Seek .(US:SKLTY)

DeepSeek API

国庆前放大招！DeepSeek-V3.2-Exp发布并开源，API成本将降低50%以上

DeepSeek API

华尔街见闻· 2025-09-29 11:12

Core Insights - DeepSeek has launched the DeepSeek-V3.2-Exp model on Hugging Face, introducing the DeepSeek Sparse Attention (DSA) mechanism to enhance training and inference efficiency for long texts [1][3] - Huawei Cloud has adapted the DeepSeek-V3.2-Exp model, supporting a maximum context length of 160K [2] - The DSA technology significantly improves training and inference efficiency for long text scenarios with minimal impact on model output [3] - The training settings of DeepSeek-V3.2-Exp were strictly aligned with the previous version, V3.1-Terminus, showing comparable performance across various benchmarks [5] - The new model has led to a reduction of over 50% in API costs, with immediate price adjustments implemented [8] - DeepSeek has made the DeepSeek-V3.2-Exp model fully open-source on Hugging Face and ModelScope, with related research papers also published [9] - The company has retained API access for the V3.1-Terminus version for comparison purposes until October 15, 2025 [9] - Additionally, DeepSeek has open-sourced GPU operators designed for the new model, recommending the use of the TileLang version for research experiments [10]

Long Text Training and Inference

DeepSeek Sparse Attention (DSA)

Long Text Training and Inference