X @Avi Chawla - Reportify

This is called KV caching!To reiterate, instead of redundantly computing KV vectors of all context tokens, cache them.To generate a token:- Generate QKV vector for the token generated one step before.- Get all other KV vectors from cache.- Compute attention.Check this👇 https://t.co/TvwvdoXJ6m ...