Workflow
NSA(Natively Sparse Attention
icon
Search documents
R2还没来,但DeepSeek的秘密武器已经“剧透”了
Hu Xiu· 2025-07-31 07:58
Core Insights - The top conference in the field of natural language processing, ACL, awarded the best paper to a joint work by DeepSeek and Peking University titled "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention" [4][3] - This paper introduces a significant advancement in the efficiency of large language models, achieving up to 11 times faster inference while maintaining model performance [5][34] Group 1: Technology and Innovation - The paper presents a novel approach to sparse attention, moving from theoretical reasoning to a complete training process, which is crucial for the future of large models [5][26] - The Native Sparse Attention (NSA) method mimics human reading strategies by compressing long texts, selecting relevant details, and maintaining a sliding window of recent context [26][30] - NSA is designed to be natively trainable, allowing the model to learn efficient attention distribution from the pre-training phase [32][51] Group 2: Performance Metrics - In various benchmark tests, the 27B model utilizing NSA outperformed traditional full attention models in 7 out of 9 metrics, particularly excelling in reasoning tasks [35][37] - The NSA method achieved a 100% information retrieval accuracy in long text comprehension tasks, demonstrating its effectiveness in handling extensive data [38][40] - Training speed improved significantly, with forward computation accelerated by 9 times and backward propagation by 6 times, while inference speed saw an impressive 11.6 times increase [44][45] Group 3: Market Implications - The advancements in NSA technology position DeepSeek as a potential leader in the AI application ecosystem, promising faster, more efficient, and cost-effective solutions for users [55][58] - The ability to process extensive documents and datasets without manual segmentation could revolutionize how users interact with AI, enhancing productivity and accessibility [54][59] - The competitive edge provided by NSA technology is expected to solidify DeepSeek's market position, transforming it from a price-driven player to a technology innovator [58][60]