KV cache - filings, earnings calls, financial reports, news

KV cache

Search documents

DDN· 2025-12-08 18:14

Who in the audience here have come across the term KV cache. Yeah, I know you have. Oh, sorry.What was those hands. Go and put them up again. There's like 5% maybe.Okay. We're going to talk about probably what is the most important thing you can do over the next five years to get more productivity out of GPU infrastructure. Does that sound good.Yes. So, first we're talking about inference and many of you probably know that the spend on inference this coming year is going to outpace the spend on training. An ...

Polyhedra· 2025-09-25 12:00

6/Currently working on Gemma3 quantization, focusing on:- Learning the new model architecture- Adding KV cache support (which accelerates inference)- Implementing quantization support for some new operators-- Full operator support will require 1+ additional day, plus more time for accuracy testingStay tuned for more updates 🔥 ...

Avi Chawla· 2025-07-27 06:31

That said, KV cache also takes a lot of memory.Llama3-70B has:- total layers = 80- hidden size = 8k- max output size = 4kHere:- Every token takes up ~2.5 MB in KV cache.- 4k tokens will take up 10.5 GB.More users → more memory.I'll cover KV optimization soon. https://t.co/VjnyLa6aLa ...