X @Avi Chawla
Avi Chawla·2026-04-26 08:07
- DeepSeek Sparse Attention (DSA)DeepSeek’s recently released V3.2 model introduced DeepSeek Sparse Attention (DSA), which brought complexity down from O(L²) to O(Lk), where k is fixed.How it works:A lightweight Lightning Indexer scores which tokens actually matter for each query.Small number of heads, runs in FP8, computationally cheap.Then a selection mechanism retrieves only the top-k key-value entries.The key insight is that only 2048 tokens get selected per query, regardless of context length.So the e ...