Workflow
Avi Chawla
icon
Search documents
X @Avi Chawla
Avi Chawla· 2025-10-26 06:31
9 real-world MCP projects for AI engineers covering:- RAG- Memory- MCP client- Voice Agent- Agentic RAG- and much more!Find them in the GitHub repo below. https://t.co/oXp4PmxvYB ...
X @Avi Chawla
Avi Chawla· 2025-10-25 06:31
Model Calibration Importance - Modern neural networks can be misleading due to overconfidence in predictions [1][2] - Calibration ensures predicted probabilities align with actual outcomes, crucial for reliable decision-making [2][3] - Overly confident but inaccurate models can lead to suboptimal decisions, exemplified by unnecessary medical tests [3] Calibration Assessment - Reliability Diagrams visually inspect model calibration by plotting expected accuracy against confidence [4] - Expected Calibration Error (ECE) quantifies miscalibration, approximated by averaging accuracy/confidence differences across bins [6] Calibration Techniques - Calibration is important when probabilities matter and models are operationally similar [7] - Binary classification models can be calibrated using histogram binning, isotonic regression, or Platt scaling [7] - Multiclass classification models can be calibrated using binning methods or matrix and vector scaling [7] Experimental Results - LeNet model achieved an accuracy of approximately 55% with an average confidence of approximately 54% [5] - ResNet model achieved an accuracy of approximately 70% but with a higher average confidence of approximately 90%, indicating overconfidence [5] - ResNet model thinks it's 90% confident in its predictions, in reality, it only turns out to be 70% accurate [2]
X @Avi Chawla
Avi Chawla· 2025-10-24 06:32
General Information - The content encourages sharing insights on DS, ML, LLMs, and RAGs [1] - The content introduces building a reasoning LLM using GRPO from scratch (100% local) [1] Author Information - Avi Chawla (@_avichawla) shares tutorials and insights daily [1]
X @Avi Chawla
Avi Chawla· 2025-10-24 06:32
Process Overview - The document provides an overview of the GRPO process [1] Resources - Code and resources are available on the @LightningAI⚡️Studio [1]
X @Avi Chawla
Avi Chawla· 2025-10-24 06:31
Let's build a reasoning LLM using GRPO, from scratch (100% local): ...
X @Avi Chawla
Avi Chawla· 2025-10-23 20:02
Core Concept of Memento - Memento reframes continual learning as memory-based online reinforcement learning over a memory-augmented MDP, learning from experiences using memory instead of updating LLM weights [2] - Memento aims to improve AI agent performance from experience without fine-tuning LLM weights [1] Key Components - Case-Based Reasoning (CBR) decomposes complex tasks into sub-tasks and retrieves relevant past experiences [2] - Executor executes each subtask using MCP tools and records outcomes in memory for future reference [3] MCP Tools - MCP tools enable the executor to accomplish most real-world tasks [3] - MCP tools include Web research, Document handling, Safe Python execution, Data analysis, and Media processing [3]
X @Avi Chawla
Avi Chawla· 2025-10-23 06:30
General Information - The document refers to a GitHub repository [1] Resource Type - The resource is likely code or data related to a project on GitHub [1]
X @Avi Chawla
Avi Chawla· 2025-10-23 06:30
Core Concept - Memento reframes continual learning as memory-based online reinforcement learning over a memory-augmented MDP, learning from experiences using memory instead of updating LLM weights [1] - The system uses Case-Based Reasoning (CBR) to decompose complex tasks into sub-tasks and retrieves relevant past experiences without needing gradients [1] System Components - The Executor executes each subtask using MCP tools and records outcomes in memory for future reference [1] - MCP tools enable the executor to accomplish most real-world tasks, including web research, document handling, safe Python execution, data analysis, and media processing [1] Potential Impact - The industry views this as a promising path toward building human-like agents [1]
X @Avi Chawla
Avi Chawla· 2025-10-22 19:14
Pytest for LLM Apps is finally here!DeepEval turns LLM evals into a two-line test suite to help you identify the best models, prompts, and architecture for AI workflows (including MCPs).Learn the limitations of G-Eval and an alternative to it in the explainer below: https://t.co/2d0KUIsILpAvi Chawla (@_avichawla):Most LLM-powered evals are BROKEN!These evals can easily mislead you to believe that one model is better than the other, primarily due to the way they are set up.G-Eval is one popular example.Here' ...
X @Avi Chawla
Avi Chawla· 2025-10-22 06:31
If you found it insightful, reshare it with your network.Find me → @_avichawlaEvery day, I share tutorials and insights on DS, ML, LLMs, and RAGs.Avi Chawla (@_avichawla):Most LLM-powered evals are BROKEN!These evals can easily mislead you to believe that one model is better than the other, primarily due to the way they are set up.G-Eval is one popular example.Here's the core problem with LLM eval techniques and a better alternative to them: https://t.co/izhjUEEipI ...