KV caching
Search documents
X @Avi Chawla
Avi Chawla· 2025-12-10 19:56
Performance Improvement - The challenge is to accelerate the token generation speed of a GPT model from 100 tokens in 42 seconds, aiming for a 5x improvement [1] Interview Scenario - The scenario involves an AI Engineer interview at OpenAI, highlighting the importance of understanding optimization techniques beyond simply allocating more GPUs [1]
X @Avi Chawla
Avi Chawla· 2025-12-10 06:42
Key Concepts - KV caching accelerates inference by pre-computing the prompt's KV cache before token generation [1] - This pre-computation explains the longer time-to-first-token (TTFT) observed in models like ChatGPT [1] Performance Bottleneck - Time-to-first-token (TTFT) is a significant performance metric in inference [1] - Improving TTFT is an area for further research and development [1]
X @Avi Chawla
Avi Chawla· 2025-12-10 06:42
This is called KV caching!To reiterate, instead of redundantly computing KV vectors of all context tokens, cache them.To generate a token:- Generate QKV vector for the token generated one step before.- Get all other KV vectors from cache.- Compute attention.Check this👇 https://t.co/TvwvdoXJ6m ...
X @Avi Chawla
Avi Chawla· 2025-10-07 19:17
Technology & Performance - LLM (Large Language Model) inference speed is affected by the use of KV caching [1] - The tweet shares a resource comparing LLM inference speed with and without KV caching [1]
X @Avi Chawla
Avi Chawla· 2025-10-07 06:31
The visual explains the underlying details of KV caching.I also wrote a detailed explainer thread on KV caching a few months back, if you want to learn more.Check below👇 https://t.co/e4KILO0cEeAvi Chawla (@_avichawla):KV caching in LLMs, clearly explained (with visuals): ...
X @Avi Chawla
Avi Chawla· 2025-08-06 06:31
Core Technique - KV caching is a technique used to speed up LLM inference [1] Explanation Resource - Avi Chawla provides a clear explanation of KV caching in LLMs with visuals [1]
X @Avi Chawla
Avi Chawla· 2025-07-27 19:23
LLM技术解析 - KV caching in LLMs:LLM 中的 KV 缓存机制被清晰地解释,并附有可视化图表 [1]
X @Avi Chawla
Avi Chawla· 2025-07-27 06:31
Key Takeaways - The author encourages readers to reshare the content if they found it insightful [1] - The author shares tutorials and insights on DS (Data Science), ML (Machine Learning), LLMs (Large Language Models), and RAGs (Retrieval-Augmented Generation) daily [1] Focus Area - The content clearly explains KV caching in LLMs with visuals [1] Author Information - Avi Chawla's Twitter handle is @_avichawla [1]
X @Avi Chawla
Avi Chawla· 2025-07-27 06:30
Technology Overview - KV caching is utilized in Large Language Models (LLMs) to enhance performance [1] - The document provides a clear explanation of KV caching in LLMs with visuals [1]