Workflow
Agent Engineering
icon
Search documents
LangSmith: Agent observability, evaluation, and deployment
LangChain· 2026-03-02 17:30
Anyone can build an agent. But is yours actually working? LangSmith is the agent engineering platform built for observability, evaluation, and everything in between — so you can stop guessing and start shipping better agents. Whether you're building on LangGraph, OpenAI SDK, Anthropic SDK, CrewAI, or anything else — LangSmith connects via OpenTelemetry. No stack changes required. Free to get started → https://smith.langchain.com Learn more about our products → https://langchain.com ...
Why We Built LangSmith for Improving Agent Quality
LangChain· 2025-11-04 16:04
Langsmith Platform Updates - Langchain is launching new features for Langsmith, a platform for agent engineering, focusing on tracing, evaluation, and observability to improve agent reliability [1] - Langsmith introduces "Insights," a feature designed to automatically identify trends in user interactions and agent behavior from millions of daily traces, helping users understand how their agents are being used and where they are making mistakes [1] - Insights is inspired by Anthropic's work on understanding conversation topics, but adapted for Langsmith's broader range of agent payloads [5][6] Evaluation and Testing - Langsmith emphasizes the importance of methodical testing, including online evaluations, to move beyond simple "vibe testing" and add rigor to agent development [1][33] - Langsmith introduces "thread evals," which allow users to evaluate agent performance across entire user interactions or conversations, providing a more comprehensive view than single-turn evaluations [16][17] - Online evals measure agent performance in real-time using production data, complementing offline evals that are based on known examples [24] - The company argues against the idea that offline evals are obsolete, highlighting their continued usefulness for regression testing and ensuring agents perform well on known interaction types [30][31] Use Cases and Applications - Insights can help product managers understand which product features are most frequently used with an agent, informing product roadmap prioritization [2][12] - Insights can assist AI engineers in identifying and categorizing agent failure modes, such as incorrect tool usage or errors, enabling targeted improvements [3][13] - Thread evals are particularly useful for evaluating user sentiment across an entire conversation or tracking the trajectory of tool calls within a conversation [21] Future Development - Langsmith plans to introduce agent and thread-level metrics into its dashboards, providing greater visibility into agent performance and cost [26] - The company aims to enable more flows with automation rules over threads, such as spot-checking threads with negative user feedback [27]