Workflow
Avi Chawla
icon
Search documents
X @Avi Chawla
Avi Chawla· 2025-08-26 06:30
Core Offering - Beam 提供了一种开源替代方案,用于部署无服务器 AI 工作负载,类似于 Modal [1] - 通过添加 Python 装饰器,即可轻松部署 AI 应用程序 [1] - 只需添加 `beam-client`,添加装饰器,即可将任何工作流程转换为无服务器端点 [1]
X @Avi Chawla
Avi Chawla· 2025-08-25 19:15
RT Avi Chawla (@_avichawla)I removed 74% of neurons from a neural network.It dropped the accuracy by just 0.50%.Here's a breakdown (with code): ...
X @Avi Chawla
Avi Chawla· 2025-08-25 06:31
That's a wrap!If you found it insightful, reshare it with your network.Find me → @_avichawlaEvery day, I share tutorials and insights on DS, ML, LLMs, and RAGs.Avi Chawla (@_avichawla):I removed 74% of neurons from a neural network.It dropped the accuracy by just 0.50%.Here's a breakdown (with code): ...
X @Avi Chawla
Avi Chawla· 2025-08-25 06:30
Of course, there is a trade-off between accuracy and size. As we reduce the size, its accuracy drops (check the video).But in most cases, accuracy is not the only metric we optimize.Instead, several operational metrics like efficiency, memory, etc., are the key factors. https://t.co/zoivU2E627 ...
X @Avi Chawla
Avi Chawla· 2025-08-25 06:30
Neural Network Performance - Removing 74% of neurons from a neural network only decreased accuracy by 0.50% [1]
X @Avi Chawla
Avi Chawla· 2025-08-24 19:30
Core Concepts - LLMs like GPT and DeepSeek serve as the foundational engine powering Agentic AI [1] - AI Agents wrap around LLMs, granting them autonomous action capabilities and making them useful in real-world workflows [2] - Agentic systems emerge from combining multiple agents, enabling collaboration and coordination [3] Agentic Infrastructure - Agentic Infrastructure encompasses tokenization & inference parameters, prompt engineering, and LLM APIs [2] - Tool usage & function calling, agent reasoning (e g, ReAct), task planning & decomposition, and memory management are crucial components [3] - Inter-Agent communication, routing & scheduling, state coordination, and Multi-Agent RAG facilitate collaboration [4] - Agent roles & specialization and orchestration frameworks (e g, CrewAI) enhance workflow construction [4] Trust, Safety, and Scalability - Observability & logging (e g, using DeepEval), error handling & retries, and security & access control are essential for trust and safety [6] - Rate limiting & cost management, workflow automation, and human-in-the-loop controls ensure scalability and governance [6] - Agentic AI features a stacked architecture, with outer layers adding reliability, coordination, and governance [5]
X @Avi Chawla
Avi Chawla· 2025-08-24 06:33
Core Concepts - LLMs like GPT and DeepSeek power Agentic AI [1] - AI Agents wrap around LLMs, enabling autonomous action [2] - Agentic systems combine multiple agents for collaboration [2] Agentic Infrastructure - Observability & logging track performance using frameworks like DeepEval [2] - Tokenization & inference parameters define text processing [3] - Prompt engineering improves output quality [3] - Tool usage & function calling connect LLMs to external APIs [4] - Agent reasoning methods include ReAct and Chain-of-Thought [4] - Task planning & decomposition break down large tasks [4] - Memory management tracks history and context [4] Multi-Agent Systems - Inter-Agent communication uses protocols like ACP, A2A [5] - Routing & scheduling determines agent task allocation [5] - State coordination ensures consistency in collaboration [5] - Multi-Agent RAG uses retrieval-augmented generation [5] - Orchestration frameworks like CrewAI build workflows [5] Enterprise Considerations - Error handling & retries provide resilience [7] - Security & access control prevent overreach [7] - Rate limiting & cost management control resource usage [7] - Human-in-the-loop controls allow oversight [7]
X @Avi Chawla
Avi Chawla· 2025-08-23 19:32
LLM Context Length Growth - GPT-3.5-turbo 的上下文长度为 4k tokens [1] - OpenAI GPT4 的上下文长度为 8k tokens [1] - Claude 2 的上下文长度为 100k tokens [1] - Llama 3 的上下文长度为 128k tokens [1] - Gemini 的上下文长度达到 1M tokens [1]
X @Avi Chawla
Avi Chawla· 2025-08-23 06:30
LLM Context Length Growth - GPT-3.5-turbo context length is 4k tokens [1] - OpenAI GPT4 context length is 8k tokens [1] - Claude 2 context length is 100k tokens [1] - Llama 3 context length is 128k tokens [1] - Gemini context length is 1M tokens [1]
X @Avi Chawla
Avi Chawla· 2025-08-23 06:30
Flash attention involves hardware-level optimizations wherein it utilizes SRAM to cache the intermediate results.This way, it reduces redundant movements, offering a speed up of up to 7.6x over standard attention methods.Check this 👇 https://t.co/R8Nfu1ZFBc ...