Workflow
DeepEval
icon
Search documents
X @Avi Chawla
Avi Chawla· 2025-09-24 21:05
RT Avi Chawla (@_avichawla)Pytest for LLM Apps is finally here!DeepEval turns LLM evals into a two-line test suite to help you identify the best models, prompts, and architecture for AI workflows (including MCPs).Works with all frameworks like LlamaIndex, CrewAI, etc.100% open-source with 11k stars! https://t.co/Xayu1aFGFV ...
X @Avi Chawla
Avi Chawla· 2025-09-24 06:33
If you found it insightful, reshare it with your network.Find me → @_avichawlaEvery day, I share tutorials and insights on DS, ML, LLMs, and RAGs.Avi Chawla (@_avichawla):Pytest for LLM Apps is finally here!DeepEval turns LLM evals into a two-line test suite to help you identify the best models, prompts, and architecture for AI workflows (including MCPs).Works with all frameworks like LlamaIndex, CrewAI, etc.100% open-source with 11k stars! https://t.co/Xayu1aFGFV ...
X @Avi Chawla
Avi Chawla· 2025-09-24 06:33
Pytest for LLM Apps is finally here!DeepEval turns LLM evals into a two-line test suite to help you identify the best models, prompts, and architecture for AI workflows (including MCPs).Works with all frameworks like LlamaIndex, CrewAI, etc.100% open-source with 11k stars! https://t.co/Xayu1aFGFV ...
X @Avi Chawla
Avi Chawla· 2025-08-05 19:33
Conversational LLM Evaluation - DeepEval enables evaluation of conversational LLM applications like ChatGPT in three steps [1] - Unlike single-turn tasks, conversational LLMs require consistent, compliant, and context-aware behavior across multiple messages [1] DeepEval Features - DeepEval allows defining multi-turn test cases as ConversationalTestCase [1] - DeepEval allows defining metrics with ConversationalGEval in plain English [1] - DeepEval provides a detailed breakdown of conversation success/failure and a score distribution [2] - DeepEval offers a full UI to inspect individual turns [2] Open-Source Aspects - DeepEval is 100% open-source with approximately 10 thousand stars [2] - DeepEval can be self-hosted, ensuring data privacy [2]
X @Avi Chawla
Avi Chawla· 2025-08-05 06:35
Evaluate conversational LLM apps like ChatGPT in 3 steps (open-source).Unlike single-turn tasks, conversations unfold over multiple messages.This means that the LLM's behavior must be consistent, compliant, and context-aware across turns, not just accurate in one-shot output.In DeepEval, you can do that with just 3 steps:1) Define your multi-turn test case as a ConversationalTestCase.2) Define a metric with ConversationalGEval in plain English.3) Run the evaluation.Done!This will provide a detailed breakdow ...