X @Avi Chawla
Avi Chawla·2025-07-27 06:31
That said, KV cache also takes a lot of memory.Llama3-70B has:- total layers = 80- hidden size = 8k- max output size = 4kHere:- Every token takes up ~2.5 MB in KV cache.- 4k tokens will take up 10.5 GB.More users → more memory.I'll cover KV optimization soon. https://t.co/VjnyLa6aLa ...