Workflow
KV caching
icon
Search documents
X @Avi Chawla
Avi Chawla· 2025-12-10 19:56
Performance Improvement - The challenge is to accelerate the token generation speed of a GPT model from 100 tokens in 42 seconds, aiming for a 5x improvement [1] Interview Scenario - The scenario involves an AI Engineer interview at OpenAI, highlighting the importance of understanding optimization techniques beyond simply allocating more GPUs [1]
X @Avi Chawla
Avi Chawla· 2025-12-10 06:42
KV caching speeds up inference by computing the prompt's KV cache before generating tokens.This is exactly why ChatGPT takes longer to generate the first token than the rest.This delay is known as time-to-first-token (TTFT).Improving TTFT is a topic for another day! https://t.co/wYaYa5paNj ...
X @Avi Chawla
Avi Chawla· 2025-12-10 06:42
This is called KV caching!To reiterate, instead of redundantly computing KV vectors of all context tokens, cache them.To generate a token:- Generate QKV vector for the token generated one step before.- Get all other KV vectors from cache.- Compute attention.Check this👇 https://t.co/TvwvdoXJ6m ...
X @Avi Chawla
Avi Chawla· 2025-10-07 19:17
Technology & Performance - LLM (Large Language Model) inference speed is affected by the use of KV caching [1] - The tweet shares a resource comparing LLM inference speed with and without KV caching [1]
X @Avi Chawla
Avi Chawla· 2025-10-07 06:31
The visual explains the underlying details of KV caching.I also wrote a detailed explainer thread on KV caching a few months back, if you want to learn more.Check below👇 https://t.co/e4KILO0cEeAvi Chawla (@_avichawla):KV caching in LLMs, clearly explained (with visuals): ...
X @Avi Chawla
Avi Chawla· 2025-10-07 06:31
Inference Optimization - LLM 推理速度对比,有无 KV 缓存 [1]
X @Avi Chawla
Avi Chawla· 2025-08-06 06:31
Core Technique - KV caching is a technique used to speed up LLM inference [1] Explanation Resource - Avi Chawla provides a clear explanation of KV caching in LLMs with visuals [1]
X @Avi Chawla
Avi Chawla· 2025-07-27 19:23
LLM技术解析 - KV caching in LLMs:LLM 中的 KV 缓存机制被清晰地解释,并附有可视化图表 [1]
X @Avi Chawla
Avi Chawla· 2025-07-27 06:31
Key Takeaways - The author encourages readers to reshare the content if they found it insightful [1] - The author shares tutorials and insights on DS (Data Science), ML (Machine Learning), LLMs (Large Language Models), and RAGs (Retrieval-Augmented Generation) daily [1] Focus Area - The content clearly explains KV caching in LLMs with visuals [1] Author Information - Avi Chawla's Twitter handle is @_avichawla [1]
X @Avi Chawla
Avi Chawla· 2025-07-27 06:30
Technology Overview - KV caching is utilized in Large Language Models (LLMs) to enhance performance [1] - The document provides a clear explanation of KV caching in LLMs with visuals [1]