Avi Chawla
Search documents
X @Avi Chawla
Avi Chawla· 2025-10-25 06:31
Model Calibration Importance - Modern neural networks can be misleading due to overconfidence in predictions [1][2] - Calibration ensures predicted probabilities align with actual outcomes, crucial for reliable decision-making [2][3] - Overly confident but inaccurate models can lead to suboptimal decisions, exemplified by unnecessary medical tests [3] Calibration Assessment - Reliability Diagrams visually inspect model calibration by plotting expected accuracy against confidence [4] - Expected Calibration Error (ECE) quantifies miscalibration, approximated by averaging accuracy/confidence differences across bins [6] Calibration Techniques - Calibration is important when probabilities matter and models are operationally similar [7] - Binary classification models can be calibrated using histogram binning, isotonic regression, or Platt scaling [7] - Multiclass classification models can be calibrated using binning methods or matrix and vector scaling [7] Experimental Results - LeNet model achieved an accuracy of approximately 55% with an average confidence of approximately 54% [5] - ResNet model achieved an accuracy of approximately 70% but with a higher average confidence of approximately 90%, indicating overconfidence [5] - ResNet model thinks it's 90% confident in its predictions, in reality, it only turns out to be 70% accurate [2]
X @Avi Chawla
Avi Chawla· 2025-10-24 06:32
General Information - The content encourages sharing insights on DS, ML, LLMs, and RAGs [1] - The content introduces building a reasoning LLM using GRPO from scratch (100% local) [1] Author Information - Avi Chawla (@_avichawla) shares tutorials and insights daily [1]
X @Avi Chawla
Avi Chawla· 2025-10-24 06:32
Process Overview - The document provides an overview of the GRPO process [1] Resources - Code and resources are available on the @LightningAI⚡️Studio [1]
X @Avi Chawla
Avi Chawla· 2025-10-24 06:31
Let's build a reasoning LLM using GRPO, from scratch (100% local): ...
X @Avi Chawla
Avi Chawla· 2025-10-23 20:02
Core Concept of Memento - Memento reframes continual learning as memory-based online reinforcement learning over a memory-augmented MDP, learning from experiences using memory instead of updating LLM weights [2] - Memento aims to improve AI agent performance from experience without fine-tuning LLM weights [1] Key Components - Case-Based Reasoning (CBR) decomposes complex tasks into sub-tasks and retrieves relevant past experiences [2] - Executor executes each subtask using MCP tools and records outcomes in memory for future reference [3] MCP Tools - MCP tools enable the executor to accomplish most real-world tasks [3] - MCP tools include Web research, Document handling, Safe Python execution, Data analysis, and Media processing [3]
X @Avi Chawla
Avi Chawla· 2025-10-23 06:30
General Information - The document refers to a GitHub repository [1] Resource Type - The resource is likely code or data related to a project on GitHub [1]
X @Avi Chawla
Avi Chawla· 2025-10-23 06:30
Fine-tuning LLM Agents without Fine-tuning LLMs!Imagine improving your AI agent's performance from experience without ever touching the model weights.It's just like how humans remember past episodes and learn from them.That's precisely what Memento does.The core concept:Instead of updating LLM weights, Memento learns from experiences using memory.It reframes continual learning as memory-based online reinforcement learning over a memory-augmented MDP.Think of it as giving your agent a notebook to remember wh ...
X @Avi Chawla
Avi Chawla· 2025-10-22 19:14
Pytest for LLM Apps is finally here!DeepEval turns LLM evals into a two-line test suite to help you identify the best models, prompts, and architecture for AI workflows (including MCPs).Learn the limitations of G-Eval and an alternative to it in the explainer below: https://t.co/2d0KUIsILpAvi Chawla (@_avichawla):Most LLM-powered evals are BROKEN!These evals can easily mislead you to believe that one model is better than the other, primarily due to the way they are set up.G-Eval is one popular example.Here' ...
X @Avi Chawla
Avi Chawla· 2025-10-22 06:31
If you found it insightful, reshare it with your network.Find me → @_avichawlaEvery day, I share tutorials and insights on DS, ML, LLMs, and RAGs.Avi Chawla (@_avichawla):Most LLM-powered evals are BROKEN!These evals can easily mislead you to believe that one model is better than the other, primarily due to the way they are set up.G-Eval is one popular example.Here's the core problem with LLM eval techniques and a better alternative to them: https://t.co/izhjUEEipI ...
X @Avi Chawla
Avi Chawla· 2025-10-22 06:31
GitHub repo: https://t.co/LfM6AdsO74(don't forget to star it ⭐ ) ...