Workflow
Avi Chawla
icon
Search documents
X @Avi Chawla
Avi Chawla· 2025-08-25 06:30
Of course, there is a trade-off between accuracy and size. As we reduce the size, its accuracy drops (check the video).But in most cases, accuracy is not the only metric we optimize.Instead, several operational metrics like efficiency, memory, etc., are the key factors. https://t.co/zoivU2E627 ...
X @Avi Chawla
Avi Chawla· 2025-08-25 06:30
Neural Network Performance - Removing 74% of neurons from a neural network only decreased accuracy by 0.50% [1]
X @Avi Chawla
Avi Chawla· 2025-08-24 19:30
Core Concepts - LLMs like GPT and DeepSeek serve as the foundational engine powering Agentic AI [1] - AI Agents wrap around LLMs, granting them autonomous action capabilities and making them useful in real-world workflows [2] - Agentic systems emerge from combining multiple agents, enabling collaboration and coordination [3] Agentic Infrastructure - Agentic Infrastructure encompasses tokenization & inference parameters, prompt engineering, and LLM APIs [2] - Tool usage & function calling, agent reasoning (e g, ReAct), task planning & decomposition, and memory management are crucial components [3] - Inter-Agent communication, routing & scheduling, state coordination, and Multi-Agent RAG facilitate collaboration [4] - Agent roles & specialization and orchestration frameworks (e g, CrewAI) enhance workflow construction [4] Trust, Safety, and Scalability - Observability & logging (e g, using DeepEval), error handling & retries, and security & access control are essential for trust and safety [6] - Rate limiting & cost management, workflow automation, and human-in-the-loop controls ensure scalability and governance [6] - Agentic AI features a stacked architecture, with outer layers adding reliability, coordination, and governance [5]
X @Avi Chawla
Avi Chawla· 2025-08-24 06:33
Core Concepts - LLMs like GPT and DeepSeek power Agentic AI [1] - AI Agents wrap around LLMs, enabling autonomous action [2] - Agentic systems combine multiple agents for collaboration [2] Agentic Infrastructure - Observability & logging track performance using frameworks like DeepEval [2] - Tokenization & inference parameters define text processing [3] - Prompt engineering improves output quality [3] - Tool usage & function calling connect LLMs to external APIs [4] - Agent reasoning methods include ReAct and Chain-of-Thought [4] - Task planning & decomposition break down large tasks [4] - Memory management tracks history and context [4] Multi-Agent Systems - Inter-Agent communication uses protocols like ACP, A2A [5] - Routing & scheduling determines agent task allocation [5] - State coordination ensures consistency in collaboration [5] - Multi-Agent RAG uses retrieval-augmented generation [5] - Orchestration frameworks like CrewAI build workflows [5] Enterprise Considerations - Error handling & retries provide resilience [7] - Security & access control prevent overreach [7] - Rate limiting & cost management control resource usage [7] - Human-in-the-loop controls allow oversight [7]
X @Avi Chawla
Avi Chawla· 2025-08-23 19:32
LLM Context Length Growth - GPT-3.5-turbo 的上下文长度为 4k tokens [1] - OpenAI GPT4 的上下文长度为 8k tokens [1] - Claude 2 的上下文长度为 100k tokens [1] - Llama 3 的上下文长度为 128k tokens [1] - Gemini 的上下文长度达到 1M tokens [1]
X @Avi Chawla
Avi Chawla· 2025-08-23 06:30
LLM Context Length Growth - GPT-3.5-turbo context length is 4k tokens [1] - OpenAI GPT4 context length is 8k tokens [1] - Claude 2 context length is 100k tokens [1] - Llama 3 context length is 128k tokens [1] - Gemini context length is 1M tokens [1]
X @Avi Chawla
Avi Chawla· 2025-08-23 06:30
Flash attention involves hardware-level optimizations wherein it utilizes SRAM to cache the intermediate results.This way, it reduces redundant movements, offering a speed up of up to 7.6x over standard attention methods.Check this 👇 https://t.co/R8Nfu1ZFBc ...
X @Avi Chawla
Avi Chawla· 2025-08-23 06:30
LLM Context Length Growth - The industry has witnessed a significant expansion in LLM context length over time [1] - GPT-3.5-turbo initially supported 4k tokens [1] - OpenAI GPT4 extended the limit to 8k tokens [1] - Claude 2 further increased the context length to 100k tokens [1] - Llama 3 achieved a context length of 128k tokens [1] - Gemini reached an impressive 1M tokens [1]
X @Avi Chawla
Avi Chawla· 2025-08-22 19:19
RT Avi Chawla (@_avichawla)You are in an ML interview.Your interviewer asks: "Why is Kernel Trick called a Trick?"Here's how to answer (with simple maths): ...
X @Avi Chawla
Avi Chawla· 2025-08-22 06:43
Machine Learning Insights - The document shares tutorials and insights on Data Science (DS), Machine Learning (ML), Large Language Models (LLMs), and Retrieval Augmented Generation (RAGs) [1] - The document presents an explanation of "Kernel Trick" in the context of an ML interview question [1] Engagement & Networking - The author encourages readers to reshare the content [1] - The author shares their social media handle for further engagement [1]