Workflow
How to Improve Evals
Greylockยท2025-09-30 19:47

Evaluation Analysis - The industry emphasizes the importance of scrutinizing both regressions and improvements in evaluation results [2] - The industry suggests that initial improvements observed during evaluation are often misleading [2] - The industry recommends focusing on refining the scoring function when encountering unexpected evaluation outcomes, rather than immediately altering the agentic system or prompt [1] Debugging and Improvement - The industry advises analyzing specific tests or cases that have worsened compared to previous evaluations to identify potential issues [1] - The industry highlights the need to validate whether observed improvements are genuine or artificial [2] - The industry suggests using fake improvements as opportunities to refine the evaluation function [2]