Evals

Search documents
X @Avi Chawla
Avi Chawla· 2025-06-30 06:33
If you found it insightful, reshare it with your network.Find me → @_avichawlaEvery day, I share tutorials and insights on DS, ML, LLMs, and RAGs.Avi Chawla (@_avichawla):A Python decorator is all you need to trace LLM apps (open-source).Most LLM evals treat the app like an end-to-end black box.But LLM apps need component-level evals and tracing since the issue can be anywhere inside the box, like the retriever, tool call, or the LLM itself. https://t.co/dWXyJb3DNs ...
Agentic Excellence: Mastering AI Agent Evals w/ Azure AI Evaluation SDK — Cedric Vidal, Microsoft
AI Engineer· 2025-06-27 10:04
As AI agents transition from experimental assistants to critical components of enterprise workflows, reliably evaluating their performance becomes essential. But how do you systematically measure an AI agent’s capabilities, contextual understanding, and accuracy across diverse scenarios? In this talk, we'll dive deep into the Azure AI Evaluation SDK, an innovative tool designed to rigorously assess agentic applications. Learn how to create powerful evaluations using structured test plans, scenarios, and adv ...
State-Of-The-Art Prompting For AI Agents
Y Combinator· 2025-05-30 14:00
Metarprompting is turning out to be a very very powerful tool that everyone's using now. It kind of actually feels like coding in you know 1995 like the tools are not all the way there. We're you know in this new frontier. But personally it also kind of feels like learning how to manage a person where it's like how do I actually communicate uh you know the things that they need to know in order to make a good decision. [Music] Welcome back to another episode of the light cone. Today we're pulling back the c ...
AI如何改变产品、护城河与创业法则?
Hu Xiu· 2025-04-28 05:42
在一日一变的AI圈,有句格言值得铭记:"你今天用的AI模型,会是你用过的最差的AI模型。"这一振 聋发聩的观点出自OpenAI首席产品官Kevin Weil之口,道出了AI发展速度之快令人咋舌。 Kevin Wheel曾在Twitter、Instagram、Facebook和Planet等科技公司担任产品负责人,也是Facebook Libra 加密货币的共同创建者,同时他还是Planet、Strava、黑人产品经理网络和自然保护协会的董事会成 员。但他说,这些职业经历与在OpenAI工作的体验截然不同。 站在AI、AGI、也许未来还有超级智能的最前沿,Kevin Wheel在一次长达1个半小时的深度访谈中,讨 论了OpenAI的产品思维、AI对工作和产品的影响、OpenAI可能不会做的市场、在AI时代最重要的技能 是什么、甚至他教自己的孩子关注什么等等一系列干货。本文特此梳理其中的十条经验之谈,希望能为 AI创业者和AI爱好者带去思考。 一、对AI产品开发的思考 这种研究与产品的融合是OpenAI产品成功的关键。Wheel解释说,"最好的产品来自于产品设计和研究 团队一起工作,共同构建新颖的东西。"他指出公 ...