Large Language Models
Search documents
X @Avi Chawla
Avi Chawla· 2026-05-10 06:58
Researchers found a way to make LLMs 8.5x faster!(without compromising accuracy)Speculative decoding is quite an effective way to address the single-token bottleneck in traditional LLM inference.A small "draft" model first generates the next several tokens, then the large model verifies all of them at once in a single forward pass.If a token at any position is wrong, you keep everything before it and restart from there. This never does worse than normal decoding.But current drafters in Speculative decoding ...
X @Avi Chawla
Avi Chawla· 2026-05-05 21:10
RT Avi Chawla (@_avichawla)The most comprehensive RL overview I've ever seen.Kevin Murphy from Google DeepMind, who has over 128k citations, wrote this.What makes this different from other RL resources:→ It bridges classical RL with the modern LLM era:There's an entire chapter dedicated to "LLMs and RL" covering:- RLHF, RLAIF, and reward modeling- PPO, GRPO, DPO, RLOO, REINFORCE++- Training reasoning models- Multi-turn RL for agents- Test-time compute scaling→ The fundamentals are crystal clearEvery major a ...