Seek .-DeepSeek最新模型上线，全新注意力机制基于北大ACL最佳论文

Core Insights - DeepSeek has launched its latest experimental model, DeepSeek-V3.2-Exp, featuring a new attention mechanism called DeepSeek Sparse Attention (DSA), which improves training and inference efficiency while reducing API costs by over 50% [1][19]. Model Features - The V3.2 model builds on DeepSeek-V3.1-Terminus and introduces DSA, achieving faster and more efficient training and inference for long contexts [3][5]. - DSA is the first key technology branded under "DeepSeek" and is an improvement over the Native Sparse Attention (NSA) from a previous collaboration with Peking University [3][5]. - The DSA mechanism allows the model to focus on a small subset of important tokens rather than all tokens, significantly reducing computational complexity from O(L²) to O(Lk), where k is much smaller than L [8][10]. Performance Evaluation - Evaluation results indicate that DeepSeek-V3.2-Exp maintains performance levels comparable to its predecessor, with no significant decline in effectiveness across both short and long text tasks [14][15]. - Specific benchmark results show that while some metrics slightly decreased, others improved, indicating a balanced performance across various tasks [15]. Cost Efficiency - The introduction of DSA has led to substantial reductions in operational costs, with the API price being lowered by over 50% for developers [19]. - The model's deployment has demonstrated significant end-to-end acceleration and cost savings in inference [18]. Future Implications - Although still an experimental model, DeepSeek-V3.2-Exp presents a promising engineering pathway for overcoming long text processing challenges without sacrificing performance [18].