Workflow
KV cache
icon
Search documents
Feeding the Future of AI | James Coomer
DDN· 2025-12-08 18:14
Who in the audience here have come across the term KV cache. Yeah, I know you have. Oh, sorry.What was those hands. Go and put them up again. There's like 5% maybe.Okay. We're going to talk about probably what is the most important thing you can do over the next five years to get more productivity out of GPU infrastructure. Does that sound good.Yes. So, first we're talking about inference and many of you probably know that the spend on inference this coming year is going to outpace the spend on training. An ...
X @Polyhedra
Polyhedra· 2025-09-25 12:00
6/Currently working on Gemma3 quantization, focusing on:- Learning the new model architecture- Adding KV cache support (which accelerates inference)- Implementing quantization support for some new operators-- Full operator support will require 1+ additional day, plus more time for accuracy testingStay tuned for more updates 🔥 ...
X @Avi Chawla
Avi Chawla· 2025-07-27 06:31
That said, KV cache also takes a lot of memory.Llama3-70B has:- total layers = 80- hidden size = 8k- max output size = 4kHere:- Every token takes up ~2.5 MB in KV cache.- 4k tokens will take up 10.5 GB.More users → more memory.I'll cover KV optimization soon. https://t.co/VjnyLa6aLa ...