Databricks CEO on evaluating AI agents
CNBC Televisionยท2025-06-12 14:45
What is a bottleneck perhaps that CEOs, CIOS, executives aren't talking enough about. Yeah, I would say this thing that we call evaluations or benchmarks like it doesn't matter if the agent can crush it at programming contests or you know do math Olympiad really well and it's smarter than us in math. We wanted to do a specific job at the company.How do we know how it's doing. That's called evaluations or benchmarks. So that's what we focused on when we launched agent bricks and it's a way to do agent learni ...