X @Avi Chawla
Avi Chawla·2025-12-10 06:42

KV caching speeds up inference by computing the prompt's KV cache before generating tokens.This is exactly why ChatGPT takes longer to generate the first token than the rest.This delay is known as time-to-first-token (TTFT).Improving TTFT is a topic for another day! https://t.co/wYaYa5paNj ...