X @Avi Chawla
Avi Chawla·2025-12-07 06:42
At 128K context, prefilling costs drop from ~$0.65 to ~$0.35 per million tokens. And Decoding drops from ~$2.4 to ~$0.8.And the performance stays the same. On some long-context benchmarks, V3.2 actually scores higher.Sparse attention isn’t new. But making it work without losing quality is hard.What are some other techniques to increase the context lengths of LLMs? ...