Workflow
evaluations
icon
Search documents
摩根士丹利:中国思考-可能改变一切的三方组合-如果被允许的话
摩根· 2025-07-03 02:41
July 2, 2025 09:30 AM GMT China Musings | Asia Pacific M Idea The Trio That Could Change Everything — If It's Allowed To China doesn't just need new stimulus, a new growth algorithm is needed too. The "trio" of reforms could eventually form the basis for that. But old habits and incentives die hard. The 15th Five- Year-Plan will be the real litmus test: is Beijing ready to stop rewarding what it wants to reduce? At the Central Commission for Financial and Economic Affairs meeting hosted this Tuesday (July 1 ...
Getting Started with LangSmith (3/6): Datasets & Evaluations
LangChain· 2025-06-25 01:05
- Code: https://github.com/xuro-langchain/eli5 - Learn more about LangSmith: https://www.langchain.com/langsmith/?utm_medium=social&utm_source=youtube&utm_campaign=q2-2025_onboarding-videos_co - Get started with LangSmith for free: https://smith.langchain.com/ - Docs: https://docs.smith.langchain.com/ ...
No Code LangSmith Evaluations
LangChain· 2025-06-18 15:10
Hey, this is Lance from Lang Chain. Evaluations are one of the most important ways to build effective agents. And we wanted to lower the barrier to entry so that anyone, not just developers, can very easily run evaluations on agents that you're building. So, we've recently added the ability to run evaluations on Langraph agents directly from Langraph Studio.This is an agent, Opend Research, that we've developed over the past few months. It's a very popular repo and many people use it. Some of the people tha ...
Databricks CEO on evaluating AI agents
CNBC Television· 2025-06-12 14:45
What is a bottleneck perhaps that CEOs, CIOS, executives aren't talking enough about. Yeah, I would say this thing that we call evaluations or benchmarks like it doesn't matter if the agent can crush it at programming contests or you know do math Olympiad really well and it's smarter than us in math. We wanted to do a specific job at the company.How do we know how it's doing. That's called evaluations or benchmarks. So that's what we focused on when we launched agent bricks and it's a way to do agent learni ...