Avi Chawla
Search documents
X @Avi Chawla
Avi Chawla· 2025-10-22 06:31
Most LLM-powered evals are BROKEN!These evals can easily mislead you to believe that one model is better than the other, primarily due to the way they are set up.G-Eval is one popular example.Here's the core problem with LLM eval techniques and a better alternative to them:Typical evals like G-Eval assume you’re scoring one output at a time in isolation, without understanding the alternative.So when prompt A scores 0.72 and prompt B scores 0.74, you still don’t know which one’s actually better.This is unlik ...
X @Avi Chawla
Avi Chawla· 2025-10-21 19:56
Core Problem & Solution - Current LLM techniques struggle to maintain focus on crucial rules and context in long conversations, leading to hallucinations and inconsistent behavior [1][2][5] - Attentive Reasoning Queries (ARQs) solve this by guiding LLMs with explicit, domain-specific questions encoded as targeted queries inside a JSON schema [3][4] - ARQs reinstate critical instructions and facilitate auditable, verifiable intermediate reasoning steps [4][6] - ARQs outperform Chain-of-Thought (CoT) reasoning and direct response generation, achieving a 90.2% success rate across 87 test scenarios [6][8] Implementation & Application - ARQs are implemented in Parlant, an open-source framework [6] - ARQs are integrated into modules like guideline proposer, tool caller, and message generator [8] - Making reasoning explicit, measurable, and domain-aware helps LLMs reason with intention, especially in high-stakes or multi-turn scenarios [7]
X @Avi Chawla
Avi Chawla· 2025-10-21 19:09
RT Avi Chawla (@_avichawla)The open-source RAG stack (2025): https://t.co/jhDZXzOPzC ...
X @Avi Chawla
Avi Chawla· 2025-10-21 06:30
The open-source RAG stack (2025): https://t.co/jhDZXzOPzC ...
X @Avi Chawla
Avi Chawla· 2025-10-20 19:45
Core Problem & Solution - The open-source Parlant framework introduces a new reasoning approach to prevent hallucinations in LLMs [1] - This new approach achieves a SOTA success rate of 90.2% [2] - It outperforms popular techniques like Chain-of-Thought [2] Key Features of Parlant - Parlant enables the building of Agents that do not hallucinate and follow instructions [1]
X @Avi Chawla
Avi Chawla· 2025-10-20 06:31
If you found it insightful, reshare it with your network.Find me → @_avichawlaEvery day, I share tutorials and insights on DS, ML, LLMs, and RAGs.Avi Chawla (@_avichawla):Finally, researchers have open-sourced a new reasoning approach that actually prevents hallucinations in LLMs.It beats popular techniques like Chain-of-Thought and has a SOTA success rate of 90.2%.Here's the core problem with current techniques that this new approach solves: https://t.co/eJnOXv0abS ...
X @Avi Chawla
Avi Chawla· 2025-10-20 06:31
GitHub repo: https://t.co/kjVj5Rp7Xm(don't forget to star it 🌟 ) ...
X @Avi Chawla
Avi Chawla· 2025-10-20 06:31
Finally, researchers have open-sourced a new reasoning approach that actually prevents hallucinations in LLMs.It beats popular techniques like Chain-of-Thought and has a SOTA success rate of 90.2%.Here's the core problem with current techniques that this new approach solves:We have enough research to conclude that LLMs often struggle to assess what truly matters in a particular stage of a long, multi-turn conversation.For instance, when you give Agents a 2,000-word system prompt filled with policies, tone r ...
X @Avi Chawla
Avi Chawla· 2025-10-19 20:32
RT Avi Chawla (@_avichawla)Here's a neural net optimization trick that leads to ~4x faster CPU to GPU transfers.Imagine an image classification task.- We define the network, load the data and transform it.- In the training loop, we transfer the data to the GPU and train.Here's the problem with this:If you look at the profiler:- Most of the time/resources will be allocated to the kernel (the actual training code).- However, a significant amount of time will also be dedicated to data transfer from CPU to GPU ...
X @Avi Chawla
Avi Chawla· 2025-10-19 06:31
Here's a neural net optimization trick that leads to ~4x faster CPU to GPU transfers.Imagine an image classification task.- We define the network, load the data and transform it.- In the training loop, we transfer the data to the GPU and train.Here's the problem with this:If you look at the profiler:- Most of the time/resources will be allocated to the kernel (the actual training code).- However, a significant amount of time will also be dedicated to data transfer from CPU to GPU (this appears under cudaMem ...