Scoring function

Search documents
How to Improve Evals
Greylockยท 2025-09-30 19:47
you know, how do you know if the problem is is the underlying prototype or the application or the EVL itself? >> So, when I run an eval, um every time I do, I look at two things. Um one is what are the uh specific tests or the cases that got worse uh versus my previous eval? And when I look at that, I think, okay, great. Um are these things actually worse? Uh and um if so, let me play with them. And you know, that's great. My EVL found something that's bad. um yay, you know, and now I can improve it. Um or ...