Agents
Search documents
Ship it! Building Production Ready Agents — Mike Chambers, AWS
AI Engineer· 2025-06-27 10:45
Generative AI and Agent Technology - Amazon Web Services (AWS) specializes in generative AI, evolving from machine learning [1] - The presentation focuses on deploying generative AI agents to cloud scale, targeting both developers and leaders [1] - The core components of an agent include a model for natural language understanding, a prompt defining the agent's role, an agentic loop for processing input and using tools, history for maintaining context, and tools for external interaction [1][2] - AWS Bedrock offers a suite of capabilities for building generative AI components, including models from Anthropic, Meta, and Mistral [2] - Amazon Bedrock Agents is a fully managed service for deploying agents without infrastructure management [2] Practical Implementation and Tools - The demonstration uses a simple Python agent with a dice rolling tool, initially running locally on a laptop with the Llama 3 8 billion parameter model [1] - The agent is configured with instructions (similar to a prompt) and action groups, which connect to tools [2] - Lambda functions are used to host the tools, enabling them to perform various actions, including interacting with other AWS services [2] - The AWS console provides a user interface for creating and configuring agents, including defining parameters and descriptions for tools [3][4][5][6][7][8][9][10][11][12][13][14][15] - Amazon Q developer is integrated into the console's code editor, offering code suggestions [17][18][19][20][21] Deployment and Scalability - The presentation emphasizes deploying agents to a production-ready, cloud-scale environment [1] - Infrastructure as code frameworks like Terraform, Palumi, and CloudFormation can be used for deployment [3] - AWS offers free courses on deeplearning.ai with AWS environments for experimenting with Amazon Bedrock Agents [25]
GraphRAG methods to create optimized LLM context windows for Retrieval — Jonathan Larson, Microsoft
AI Engineer· 2025-06-27 09:48
Graph RAG Applications & Performance - Graph RAG is a key enabler for building effective AI applications, especially when paired with agents [1] - Graph RAG excels at semantic understanding and can perform global queries over a code repository [2][3] - Graph RAG can be used for code translation from Python to Rust, outperforming direct LLM translation [4][9] - Graph RAG can be applied to large codebases like Doom (100,000 lines of code, 231 files) for documentation and feature development [10][12][13] - Graph RAG, when combined with GitHub Copilot coding agent, enables complex multi-file modifications, such as adding jump capability to Doom [18][20] Benchmark QED & Lazy Graph - Benchmark QED is a new open-source tool for measuring and evaluating Graph RAG systems, focusing on local and global quality metrics [21][22] - Benchmark QED includes AutoQ (query generation), AutoE (evaluation using LLM as a judge), and AutoD (dataset summarization and sampling) [22] - Lazy Graph RAG demonstrates dominant performance against vector RAG on data local questions, winning 92%, 90%, and 91% of the time against 8K, 120K, and 1 million token context windows respectively [29][30] - Lazy Graph RAG can achieve performance at a tenth of the cost compared to using a 1 million token context window [32] - Lazy Graph RAG is being incorporated into Azure AI and Microsoft Discovery Platform [34]
Agentic GraphRAG: Simplifying Retrieval Across Structured & Unstructured Data — Zach Blumenfeld
AI Engineer· 2025-06-27 09:44
Knowledge Graph Architecture & Agentic Workflows - Knowledge graphs can enhance agentic workflows by enabling reasoning and question decomposition, moving beyond simple vector searches [4] - Knowledge graphs facilitate the expression of simple data models to agents, aiding in accurate information retrieval and expansion with more data [5] - The integration of knowledge graphs allows for more precise question answering through a more expressive data model [22] Data Modeling & Entity Extraction - Data modeling should focus on defining key entities and their relationships, such as people, skills, and activities [17] - Entity extraction from unstructured documents, like resumes, can be used to create a graph database representing these relationships [18] - Pydantic classes and Langchain can be used for entity extraction workflows to decompose documents and extract JSON data containing skills and accomplishments [19][20] Benefits of Graph Databases - Graph databases enable flexible queries and high performance for complex traversals across skills, systems, domains, and accomplishments [30] - Graph databases allow for easy addition of new data and relationships, which is crucial for rapid iteration and adaptation in agentic systems [37] - Graph databases facilitate the creation of tools to find collaborators based on shared projects and domains [39] Practical Application: Employee Skills Analysis - The presentation uses an employee graph example to demonstrate skills analysis, similarity searches, and identification of skill gaps [5] - Initial attempts to answer questions using only document embeddings are inaccurate, highlighting the need for entity extraction and metadata [9] - By leveraging a knowledge graph, the system can accurately answer questions about the number of developers with specific skills, such as Python, and identify similar employees based on skill sets [24][25]
X @Avi Chawla
Avi Chawla· 2025-06-25 06:31
If you found it insightful, reshare it with your network.Find me → @_avichawlaEvery day, I share tutorials and insights on DS, ML, LLMs, and RAGs.Avi Chawla (@_avichawla):How Agents test Agents, clearly explained (with code): ...
X @Avi Chawla
Avi Chawla· 2025-06-25 06:31
Agent Testing Methodology - Traditional testing methods are inadequate for evaluating Agents due to the lack of fixed inputs and exact outputs when dealing with language [1] - The industry utilizes Agent-based testing, simulating Users and Judges to assess Agent performance [1] - The process involves testing Agents with Agents to evaluate their responses in a more realistic and dynamic environment [1]
X @Avi Chawla
Avi Chawla· 2025-06-25 06:30
How Agents test Agents, clearly explained (with code): ...
Model Context Protocol: Origins and Requests For Startups — Theodora Chu, MCP PM, Anthropic
AI Engineer· 2025-06-18 22:55
MCP Origins and Goals - MCP was created to address the challenge of constantly copying and pasting context into LLMs, aiming to give models the ability to interact with the outside world [4][5][6] - The goal is to establish an open-source, standardized protocol for model agency, enabling broader participation in the ecosystem [7][8] - Anthropic believes that enabling model agency is crucial for LLMs to reach the next level of usefulness and intelligence [8] MCP Development and Adoption - MCP was initially developed internally and gained traction during a company hack week [9][10] - Early feedback questioned the need for a new protocol and its open-source nature, given existing tool-calling capabilities [12][13] - Adoption by coding tools like Cursor marked a turning point, followed by broader adoption from Google, Microsoft, and OpenAI [14] Protocol Principles and Updates - The protocol prioritizes server simplicity, even if it increases client complexity, based on the belief that there will be more servers than clients [20][21] - Recent updates include support for streamable HTTP to enable more birectionality for agent communication [19] - Future development focuses on enhancing the agent experience, including elicitation to allow servers to request more information from end users [26][27] - Plans include a registry API to facilitate models finding MCPs independently, further supporting model agency [28] Ecosystem Opportunities - The industry needs more high-quality servers across various verticals beyond dev tools, such as sales, finance, legal, and education [31][34] - There is a significant opportunity in simplifying server building through tooling for hosting, testing, evaluation, and deployment [36] - Automated MCP server generation is a potential future direction, leveraging increasing model intelligence [37] - Tooling around AI security, observability, and auditing is crucial as applications gain more access to external data [38]
No Code LangSmith Evaluations
LangChain· 2025-06-18 15:10
LangChain Agent Evaluation - LangChain 降低了 Agent 评估的门槛,使得非开发者也能轻松进行 [1] - Langraph Studio 新增了快速评估 Langraph Agent 的功能 [3] - 用户可以在 Langraph Studio 中选择数据集并启动评估实验 [3][4] - 评估结果可在 Langsmith 中查看,包括模型输出和评估分数 [5] Evaluation Importance and Accessibility - 评估对于构建有效的 Agent 至关重要 [7] - 传统评估对开发者有较高要求,需要掌握 SDK、Piest 和 Evaluate API 等 [7] - LangChain 旨在提供一种无需代码的方式,让任何人都能评估 Langraph Agent [8] - 非技术用户可以基于直觉评估模型选择和提示词等 [9] Configuration and Customization - 用户可以在 Studio 界面中轻松切换 graph 配置,并以此为基础启动评估 [9] - 开发者可以预先设置包含输入主题和参考输出的数据集 [10] - 可以将评估器(Evaluator)绑定到数据集,并自定义评估标准和评分规则 [11][12][13] - 用户可以在 Studio 中修改 graph 配置(如模型、提示词),并启动新的评估实验 [15][16][17] - Studio 提供了无代码配置方式,方便快速迭代 [18]
Gateway to AGI
Matthew Berman· 2025-06-17 14:07
Alpha Evolve AI that can discover new knowledge. It really feels like we're at this inflection point of the intelligence explosion. Are we at that inflection point given it seems like this is self-improving artificial intelligence.You're spot on to the potential for something like Alpha Evolve. It's amazing. We launch in this lowkey way.Yeah, it's one of the most groundbreaking work we are doing. The fact that you know you can have these agents which can go improve code, make discoveries. What an extraordin ...
Exposing Agents as MCP servers with mcp-agent: Sarmad Qadri
AI Engineer· 2025-06-11 16:57
My name is Sarmad and today I want to talk about building effective agents with model context protocol or MCP. So a lot has changed in the last year. Um especially as far as agent development is concerned.I think 2025 is the year of agents and uh things like MCP make agent design simpler and more robust than ever before. So I want to talk about what the agent tech stack looks like in 2025. The second thing is a lot of uh MCP servers today are just you know onetoone mappings of existing REST API uh uh servic ...