Workflow
Avi Chawla
icon
Search documents
X @Avi Chawla
Avi Chawla· 2025-12-10 12:17
Model Performance - The model currently generates 100 tokens in 42 seconds [1] - The goal is to achieve a 5x speed improvement in token generation [1] Optimization Strategies - Simply allocating more GPUs is an insufficient solution for optimizing model speed [1]
X @Avi Chawla
Avi Chawla· 2025-12-10 06:42
KV caching speeds up inference by computing the prompt's KV cache before generating tokens.This is exactly why ChatGPT takes longer to generate the first token than the rest.This delay is known as time-to-first-token (TTFT).Improving TTFT is a topic for another day! https://t.co/wYaYa5paNj ...
X @Avi Chawla
Avi Chawla· 2025-12-10 06:42
This is called KV caching!To reiterate, instead of redundantly computing KV vectors of all context tokens, cache them.To generate a token:- Generate QKV vector for the token generated one step before.- Get all other KV vectors from cache.- Compute attention.Check this👇 https://t.co/TvwvdoXJ6m ...
X @Avi Chawla
Avi Chawla· 2025-12-10 06:41
Performance Bottleneck - The initial response suggests a simple solution of allocating more GPUs, but it misses deeper optimization opportunities [1] - The model generates 100 tokens in 42 seconds, implying a need for significant speed improvement [1] Missed Optimization Opportunities - The response lacks exploration of algorithmic optimizations or model architecture improvements [1] - The response doesn't consider potential software or hardware bottlenecks beyond GPU allocation [1]
X @Avi Chawla
Avi Chawla· 2025-12-09 19:31
RT Avi Chawla (@_avichawla)AWS did it again!They have introduced a novel way for developers to build Agents.Today, when you build an Agent, you start with a simple goal, then end up juggling prompts, routing logic, error handling, tool orchestration, and fallback flows.One unexpected user input and the whole thing collapses.Strands Agents framework by AWS approaches Agent building differently.It takes a model-driven approach that lets the LLM decide how to plan, choose tools, and adapt to edge cases on its ...
X @Avi Chawla
Avi Chawla· 2025-12-09 13:00
If you found it insightful, reshare it with your network.Find me → @_avichawlaEvery day, I share tutorials and insights on DS, ML, LLMs, and RAGs. https://t.co/np057bqlC3Avi Chawla (@_avichawla):AWS did it again!They have introduced a novel way for developers to build Agents.Today, when you build an Agent, you start with a simple goal, then end up juggling prompts, routing logic, error handling, tool orchestration, and fallback flows.One unexpected user input and https://t.co/KPS3aKAer9 ...
X @Avi Chawla
Avi Chawla· 2025-12-09 06:31
GitHub repo: https://t.co/twd4zJVi0U ...
X @Avi Chawla
Avi Chawla· 2025-12-09 06:31
AWS did it again!They have introduced a novel way for developers to build Agents.Today, when you build an Agent, you start with a simple goal, then end up juggling prompts, routing logic, error handling, tool orchestration, and fallback flows.One unexpected user input and the whole thing collapses.Strands Agents framework by AWS approaches Agent building differently.It takes a model-driven approach that lets the LLM decide how to plan, choose tools, and adapt to edge cases on its own.You provide the capabil ...
X @Avi Chawla
Avi Chawla· 2025-12-08 19:06
RT Avi Chawla (@_avichawla)If you need a video guide to Karpathy's nanochat, check out Stanford's CS336!It covers:- Tokenization- Resource Accounting- Pretraining- Finetuning (SFT/RLHF)- Overview of Key Architectures- Working with GPUs- Kernels and Tritons- Parallelism- Scaling Laws- Inference- Evaluation- AlignmentEverything you need to prepare for a job at Frontier AI Labs.I have shared the playlist in the replies! ...
X @Avi Chawla
Avi Chawla· 2025-12-08 12:08
If you found it insightful, reshare it with your network.Find me → @_avichawlaEvery day, I share tutorials and insights on DS, ML, LLMs, and RAGs. https://t.co/X8z3SMqptRAvi Chawla (@_avichawla):If you need a video guide to Karpathy's nanochat, check out Stanford's CS336!It covers:- Tokenization- Resource Accounting- Pretraining- Finetuning (SFT/RLHF)- Overview of Key Architectures- Working with GPUs- Kernels and Tritons- Parallelism- Scaling Laws- Inference https://t.co/7oCl2Od1fO ...