Workflow
KV cache
icon
Search documents
Feeding the Future of AI | James Coomer
DDN· 2025-12-08 18:14
Inference Market & KV Cache Importance - Inference spending is projected to surpass training spending, highlighting its growing significance in the AI landscape [2] - KV cache is crucial for understanding context in prefill stages and augmenting tokens in decode stages during inference [3][4] - Utilizing DDN as a KV cache can potentially save hundreds of millions of dollars by retrieving previously computed contexts instead of recomputing them [5] Disaggregated Inference & Performance - Disaggregated inference, running prefill and decode on different GPUs, improves efficiency, requiring a global KV cache for information dissemination [6] - DDN's fast storage delivers KV caches at extremely high speeds, leading to massive efficiency gains [9] - DDN's throughput is reportedly 15 times faster than competitors, resulting in a 20 times faster token output [10] Productivity & Cost Efficiency - Implementing a fast shared KV cache like DDN can lead to a 60% increase in output from GPU infrastructure [12] - DDN aims to deliver a 60% increase in tokens output per watt, per data center, per GPU, and per capital dollar expenditure [13] - Using DDN offers the strongest improvement in GPU productivity over the next five years by accelerating inference models [12]
X @Polyhedra
Polyhedra· 2025-09-25 12:00
6/Currently working on Gemma3 quantization, focusing on:- Learning the new model architecture- Adding KV cache support (which accelerates inference)- Implementing quantization support for some new operators-- Full operator support will require 1+ additional day, plus more time for accuracy testingStay tuned for more updates 🔥 ...
X @Avi Chawla
Avi Chawla· 2025-07-27 06:31
That said, KV cache also takes a lot of memory.Llama3-70B has:- total layers = 80- hidden size = 8k- max output size = 4kHere:- Every token takes up ~2.5 MB in KV cache.- 4k tokens will take up 10.5 GB.More users → more memory.I'll cover KV optimization soon. https://t.co/VjnyLa6aLa ...