Workflow
benchmarks
icon
Search documents
The Becoming Benchmark | Chimezie Nwabueze | TEDxBAU Cyprus
TEDx Talks· 2025-06-25 15:56
How many times have you gotten to the accomplishment of your perhaps only to realize I think it's okay only to realize that you still felt empty or unfulfilled? It could be a job position, a degree, certain number on social media, or maybe a material acquisition or possession. And granted, for a few days or weeks, the excitement might still be high, but then after that, you ask yourself, is that it? Was that all I was striving for? I mean, you expected that he was going to be the game changer for your life. ...
Databricks CEO on evaluating AI agents
CNBC Television· 2025-06-12 14:45
What is a bottleneck perhaps that CEOs, CIOS, executives aren't talking enough about. Yeah, I would say this thing that we call evaluations or benchmarks like it doesn't matter if the agent can crush it at programming contests or you know do math Olympiad really well and it's smarter than us in math. We wanted to do a specific job at the company.How do we know how it's doing. That's called evaluations or benchmarks. So that's what we focused on when we launched agent bricks and it's a way to do agent learni ...
Surfacing Semantic Orthogonality Across Model Safety Benchmarks: A Multi-Dimensional Analysis
AI Engineer· 2025-06-11 15:40
Great. Thank you for the introduction and thanks to the International Advanced Natural Language Processing Conference for organizing this and uh thanks as well for allowing this this talk to start and kick off the the conference. I appreciate it.You guys have done a great job. Um in terms of the um the topic, I do have to uh make sure that we understand the contextual uh background behind this this this topic uh today and recent events over the last few weeks and months. Uh so I'm going to take a few minute ...