Avi Chawla
Search documents
X @Avi Chawla
Avi Chawla· 2026-05-11 20:31
RT Avi Chawla (@_avichawla)Anthropic's in trouble, again.The entire Claude experience is now available at 1/6th the price.Kimi now does everything Claude does, powered by K2.6, a 1-trillion-parameter MoE model that activates only 32B parameters per token.It covers all three features Claude has (Chat, Code, and Cowork):1) Kimi Chat runs in four modes- Instant for fast responses- Thinking for deep reasoning- Agent for multi-step execution- and Agent Swarm for parallel workloads.There's a 262K context window a ...
X @Avi Chawla
Avi Chawla· 2026-05-11 08:38
Anthropic's in trouble, again.The entire Claude experience is now available at 1/6th the price.Kimi now does everything Claude does, powered by K2.6, a 1-trillion-parameter MoE model that activates only 32B parameters per token.It covers all three features Claude has (Chat, Code, and Cowork):1) Kimi Chat runs in four modes- Instant for fast responses- Thinking for deep reasoning- Agent for multi-step execution- and Agent Swarm for parallel workloads.There's a 262K context window across all of them.2) Kimi C ...
X @Avi Chawla
Avi Chawla· 2026-05-10 06:58
Researchers found a way to make LLMs 8.5x faster!(without compromising accuracy)Speculative decoding is quite an effective way to address the single-token bottleneck in traditional LLM inference.A small "draft" model first generates the next several tokens, then the large model verifies all of them at once in a single forward pass.If a token at any position is wrong, you keep everything before it and restart from there. This never does worse than normal decoding.But current drafters in Speculative decoding ...
X @Avi Chawla
Avi Chawla· 2026-05-09 20:22
The full-stack AI engineering roadmap covering:> Prompt engineering> RAG systems> Context engineering> Fine-tuning> Agents> LLM deployment> LLM optimization> Safety, evals & observabilityFree and open-source resources in the article below.(don't forget to bookmark) https://t.co/7tk3aI5bXvAvi Chawla (@_avichawla):https://t.co/658QADQmDC ...
X @Avi Chawla
Avi Chawla· 2026-05-08 20:25
RT Avi Chawla (@_avichawla)Karpathy said something you'll regret ignoring:"Remove yourself as the bottleneck. Maximize your leverage. Put in very few tokens, and a huge amount of stuff happens on your behalf."The reason most people can't do this today is because their AI has little to no memory of their work.You sit in meetings, read threads, make decisions, and your brain quietly drops half of it by next week. Then you spend time re-reading, re-asking, re-explaining context to your own AI.You can't remove ...
X @Avi Chawla
Avi Chawla· 2026-05-07 20:17
Karpathy said something you'll regret ignoring:"Remove yourself as the bottleneck. Maximize your leverage. Put in very few tokens, and a huge amount of stuff happens on your behalf."The reason most people can't do this today is because their AI has little to no memory of their work.You sit in meetings, read threads, make decisions, and your brain quietly drops half of it by next week. Then you spend time re-reading, re-asking, re-explaining context to your own AI.You can't remove yourself from the loop when ...
X @Avi Chawla
Avi Chawla· 2026-05-06 21:04
RT Avi Chawla (@_avichawla)Layers of observability in AI systems, explained visually:If you’re deploying LLM-powered apps to real users, you need to know what’s happening inside your pipeline at every step.Here’s the mental model (see the attached diagram):Think of your AI pipeline as a series of steps. For simplicity, consider RAG.A user asks a question, it flows through multiple components, and eventually, a response comes out.Each of those steps takes time, each step can fail, and each step has its own c ...