X @Avi Chawla
Avi Chawlaยท2025-10-22 19:14
Pytest for LLM Apps is finally here!DeepEval turns LLM evals into a two-line test suite to help you identify the best models, prompts, and architecture for AI workflows (including MCPs).Learn the limitations of G-Eval and an alternative to it in the explainer below: https://t.co/2d0KUIsILpAvi Chawla (@_avichawla):Most LLM-powered evals are BROKEN!These evals can easily mislead you to believe that one model is better than the other, primarily due to the way they are set up.G-Eval is one popular example.Here' ...