X @Avi Chawla
Avi Chawla·2025-12-07 06:42
  1. DeepSeek Sparse Attention (DSA)DeepSeek’s new V3.2 model introduces DeepSeek Sparse Attention (DSA), which brings complexity down from O(L²) to O(Lk), where k is fixed.How it works:A lightweight Lightning Indexer scores which tokens actually matter for each query.Small number of heads, runs in FP8, computationally cheap.Then a selection mechanism retrieves only the top-k key-value entries.The key insight is that only 2048 tokens get selected per query, regardless of context length.So the expensive attent ...