Workflow
DSA(DeepSeek稀疏注意力机制)
icon
Search documents
人工智能专题:DeepSeek的稀疏注意力机制给AI产业释放更大的发展潜能
Zhongyuan Securities· 2025-10-16 11:46
Investment Rating - The industry investment rating is "Outperform the Market" with an expected increase of over 10% relative to the CSI 300 index in the next six months [41]. Core Insights - The report emphasizes that the introduction of sparse attention mechanisms, particularly through DeepSeek, significantly enhances the development potential of the AI industry [8][37]. - DeepSeek's advancements in attention mechanisms, including Native Sparse Attention (NSA) and DeepSeek Sparse Attention (DSA), are pivotal in improving model performance and efficiency [18][23][37]. Summary by Sections 1. Relationship Between Attention Mechanism and Large Model Development - The attention mechanism, introduced to improve information processing efficiency, has become a core component of large models, addressing the limitations of traditional recurrent neural networks [11]. - Sparse attention reduces computational complexity from O(L²) to sub-quadratic levels, thus overcoming memory and computational bottlenecks [11]. 2. DeepSeek's Technological Improvements in Attention Mechanism - DeepSeek has made significant contributions in three main areas: Multi-head Latent Attention (MLA), Native Sparse Attention (NSA), and DeepSeek Sparse Attention (DSA) [12][18][23]. - MLA reduces memory usage by approximately 90% while maintaining model performance, significantly lowering training costs [16]. - NSA enhances long text processing speed by 11 times and achieves performance comparable to traditional models [18]. - DSA improves training and inference efficiency, leading to substantial cost reductions for model usage [23]. 3. DSA and NSA Unlock Greater Development Potential for the AI Industry - The integration of DSA and NSA allows for expanded model context and improved computational efficiency, which are crucial for meeting the demands of multi-modal applications [33][37]. - The trend towards longer input and output lengths necessitates innovative approaches to model training and performance enhancement [33].