X @Avi Chawla
Avi Chawla·2025-12-10 06:42
This is called KV caching!To reiterate, instead of redundantly computing KV vectors of all context tokens, cache them.To generate a token:- Generate QKV vector for the token generated one step before.- Get all other KV vectors from cache.- Compute attention.Check this👇 https://t.co/TvwvdoXJ6m ...