Metrics

Search documents
Meta's miss: Audio Rooms
20VC with Harry Stebbings· 2025-09-07 14:01
Goal Setting - The North Star is the goal, not a metric [1] - The company's goal should be clearly defined [1] - A metric is used to describe the goal, but it is never a perfect representation [1] Metric Definition - The most important thing is to have absolute clarity on what your goal is and then do the best you possibly can to describe that goal with a metric [1] - A metric is always broken [1] Example - When joining Meta, the goal was to connect the world online [1]
The truth about North Star metrics
20VC with Harry Stebbings· 2025-09-06 14:00
The most important thing is the northstar is not a metric. So the northstar is your goal. Your goal is like what are you trying to achieve.When I joined meta it was connect the world online. Full stop. And then the metric describes the goal.And so the single most important thing is to have absolute clarity on what your goal is. And then do the best you possibly can to describe that goal with a metric. And remember a metric never perfectly describes a goal.A metric is always broken. ...
Practical tactics to build reliable AI apps — Dmitry Kuchin, Multinear
AI Engineer· 2025-08-03 04:34
Core Problem & Solution - Traditional software development lifecycle is insufficient for AI applications due to non-deterministic models, requiring a data science approach and continuous experimentation [3] - The key is to reverse engineer metrics from real-world scenarios, focusing on product experience and business outcomes rather than abstract data science metrics [6] - Build evaluations (evals) at the beginning of the process, not at the end, to identify failures and areas for improvement early on [14] - Continuous improvement of evals and solutions is necessary to reach a baseline benchmark for optimization [19] Evaluation Methodology - Evaluations should mimic specific user questions and criteria relevant to the solution's end goal [7] - Use Large Language Models (LLMs) to generate evaluations, considering different user personas and expected answers [9][11] - Focus on the details of each evaluation failure to understand the root cause, whether it's the test definition or the solution's performance [15] - Experimentation involves changing models, logic, prompts, or data, and continuously running evaluations to catch regressions [16][18] Industry Specific Examples - For customer support bots, measure the rate of escalation to human support as a key metric [5] - For text-to-SQL or text-to-graph database applications, create a mock database with known data to validate expected results [22] - For call center conversation classifiers, use simple matching to determine if the correct rubric is applied [23] Key Takeaways - Evaluate AI applications the way users actually use them, avoiding abstract metrics [24] - Frequent evaluations enable rapid progress and reduce regressions [25] - Well-defined evaluations lead to explainable AI, providing insights into how the solution works and its limitations [26]
X @Token Terminal 📊
Token Terminal 📊· 2025-07-07 23:07
RT Token Terminal 📊 (@tokenterminal)BIG BEAUTIFUL METRICS PAGES & WHERE TO FIND THEM:📂 Market sectors📂 Lending📂 Metrics📂 Active loansThank you for your attention to this matter! https://t.co/cpeKGGIESA ...
Taming Rogue AI Agents with Observability-Driven Evaluation — Jim Bennett, Galileo
AI Engineer· 2025-06-27 10:27
AI Agent Evaluation & Observability - The industry emphasizes the necessity of observability in AI development, particularly for evaluation-driven development [1] - AI trustworthiness is a significant concern, highlighting the need for robust evaluation methods [1] - Detecting problems in AI is challenging due to its non-deterministic nature, making traditional unit testing difficult [1] AI-Driven Evaluation - The industry suggests using AI to evaluate AI, leveraging its ability to understand and identify issues in AI systems [1] - LLMs can be used to score the performance of other LLMs, with the recommendation to use a better (potentially more expensive or custom-trained) LLM for evaluation than the one used in the primary application [2] - Galileo offers a custom-trained small language model (SLM) designed for effective AI evaluations [2] Implementation & Metrics - Evaluations should be integrated from the beginning of the AI application development process, including prompt engineering and model selection [2] - Granularity in evaluation is crucial, requiring analysis at each step of the AI workflow to identify failure points [2] - Key metrics for evaluation include action completion (did it complete the task) and action advancement (did it move towards the goal) [2] Continuous Improvement & Human Feedback - AI can provide insights and suggestions for improving AI agent performance based on evaluation data [3] - Human feedback is essential to validate and refine AI-generated metrics, ensuring accuracy and continuous learning [4] - Real-time prevention and alerting are necessary to address rogue AI agents and prevent issues in production [8]