Workflow
LLM Ops
icon
Search documents
Why We Built LangSmith for Improving Agent Quality
LangChain· 2025-11-04 16:04
Langsmith Platform Updates - Langchain is launching new features for Langsmith, a platform for agent engineering, focusing on tracing, evaluation, and observability to improve agent reliability [1] - Langsmith introduces "Insights," a feature designed to automatically identify trends in user interactions and agent behavior from millions of daily traces, helping users understand how their agents are being used and where they are making mistakes [1] - Insights is inspired by Anthropic's work on understanding conversation topics, but adapted for Langsmith's broader range of agent payloads [5][6] Evaluation and Testing - Langsmith emphasizes the importance of methodical testing, including online evaluations, to move beyond simple "vibe testing" and add rigor to agent development [1][33] - Langsmith introduces "thread evals," which allow users to evaluate agent performance across entire user interactions or conversations, providing a more comprehensive view than single-turn evaluations [16][17] - Online evals measure agent performance in real-time using production data, complementing offline evals that are based on known examples [24] - The company argues against the idea that offline evals are obsolete, highlighting their continued usefulness for regression testing and ensuring agents perform well on known interaction types [30][31] Use Cases and Applications - Insights can help product managers understand which product features are most frequently used with an agent, informing product roadmap prioritization [2][12] - Insights can assist AI engineers in identifying and categorizing agent failure modes, such as incorrect tool usage or errors, enabling targeted improvements [3][13] - Thread evals are particularly useful for evaluating user sentiment across an entire conversation or tracking the trajectory of tool calls within a conversation [21] Future Development - Langsmith plans to introduce agent and thread-level metrics into its dashboards, providing greater visibility into agent performance and cost [26] - The company aims to enable more flows with automation rules over threads, such as spot-checking threads with negative user feedback [27]
港股异动 | 第四范式(06682)绩后高开4% 首季度毛利润同比增超三成 企业级Agent已在超过14个行业落地
智通财经网· 2025-05-16 01:37
Group 1 - The core business progress report for Q1 FY2025 shows total revenue of RMB 1.077 billion, a year-on-year increase of 30.1%, and gross profit of RMB 444 million, also up by 30.1%, with a gross margin of 41.2% [1] - The company has launched an upgraded version of its AI platform, the "XianZhi" platform, which includes the AI Agent full-process development platform, allowing enterprise clients to easily integrate over 150 mainstream large models [1] - The AI platform is equipped with a rich set of ready-to-use AI applications covering multiple core enterprise scenarios, including AIGC, smart office, digital employees, intelligent Q&A, AI local search, decision analysis, large model development tools, model repository, and agent platform [1] Group 2 - The company's enterprise-level AI Agents have been implemented in over 14 industries, including finance, aviation, automotive, healthcare, energy, retail, ports, water conservancy, and education [2] - By transforming enterprise software with AI Agents, the company is horizontally integrating into high-frequency enterprise software products and collaborating with other leading enterprise software companies [2] - The company has launched various AI Agents targeting specific functions, such as "Collaborative Operation AI Agent" for OA processes, "Tax Empowerment Agent" for financial systems, "HR Agent" for human resources, and "Sales Manager Assistant Agent" for sales [2]