Workflow
AI Engineer
icon
Search documents
Collaborating with Agents in your Software Dev Workflow - Jon Peck & Christopher Harrison, Microsoft
AI Engineer· 2025-06-27 10:05
GitHub Copilot's agentic capabilities enhance its ability to act as a peer programmer. From the IDE to the repository, Copilot can generate code, run tests, and perform tasks like creating pull requests using Model Context Protocol (MCP). This instructor-led lab will guide you through using agent capabilities on both the client and the server: Key takeaways include: Understanding how to bring agents into your software development workflow Identifying scenarios where agents can be most impactful, as well as ...
Agentic Excellence: Mastering AI Agent Evals w/ Azure AI Evaluation SDK — Cedric Vidal, Microsoft
AI Engineer· 2025-06-27 10:04
AI Agent Evaluation - Azure AI Evaluation SDK is designed to rigorously assess agentic applications, focusing on capabilities, contextual understanding, and accuracy [1] - The SDK enables the creation of evaluations using structured test plans, scenarios, and advanced analytics to identify strengths and weaknesses of AI agents [1] - Companies are leveraging the SDK to enhance agent trustworthiness, reliability, and performance in conversational agents, data-driven decision-makers, and autonomous workflow orchestrators [1] Microsoft's AI Initiatives - Microsoft is promoting AI in startups and facilitating the transition of research and startup products to the market [1] - Cedric Vidal, Principal AI Advocate at Microsoft, specializes in Generative AI and the startup and research ecosystems [1] Industry Expertise - Cedric Vidal has experience as an Engineering Manager in the AI data labeling space for the self-driving industry and as CTO of a Fintech AI SAAS startup [1] - He also has 10 years of experience as a software engineering services consultant for major Fintech enterprises [1]
How fast are LLM inference engines anyway? — Charles Frye, Modal
AI Engineer· 2025-06-27 10:01
Open Model Landscape & Benchmarking - Open-weight models are catching up to Frontier Labs in capabilities, making many AI Engineer applications possible that weren't before [1] - Open-source engines like VLM, SGLang, and Tensor TLM are readily available, reducing the need for custom model implementations [1] - Modal has created a public benchmark (modal.com/llmalmanac) for comparing the performance of different models and engines across various context lengths [2][3] Performance Analysis - Throughput is significantly higher when processing longer input contexts (prefill) compared to generating longer output sequences (decode), with up to a 4x improvement observed [15][16] - The time to first token (latency) remains nearly constant even with a 10x increase in input tokens, suggesting a "free lunch" by prioritizing context over reasoning [19] - Gemma 7B models show roughly the same throughput as Qwen 3 models, despite being 10x smaller in model weights, indicating optimization differences [12] Optimization & Infrastructure - Scaling out (adding more GPUs) is the primary method for increasing total throughput, rather than scaling up (optimizing a single GPU) [23] - Benchmarking methodology involves sending a thousand requests to determine maximum throughput and sending single requests to determine fastest possible server run time [24][25] - BF16 precision offers slower tensor core support compared to FP8 or FP4, suggesting potential for even greater performance gains with lower precision formats on newer hardware like Blackwell [16][17]
RAG in 2025: State of the Art and the Road Forward — Tengyu Ma, MongoDB (Voyage AI)
AI Engineer· 2025-06-27 09:59
Retrieval Augmented Generation (RAG) & Large Language Models (LLMs) - RAG is essential for enterprises to incorporate proprietary information into LLMs, addressing the limitations of out-of-the-box models [2][3] - RAG is considered a more reliable, faster, and cheaper approach compared to fine-tuning and long context windows for utilizing external knowledge [7] - The industry has seen significant improvements in retrieval accuracy over the past 18 months, driven by advancements in embedding models [11][12] - The industry averages approximately 80% accuracy across 100 datasets, indicating a 20% potential improvement headroom in retrieval tasks [12][13] Vector Embeddings & Storage Optimization - Techniques like matryoshka learning and quantization can reduce vector storage costs by up to 100x with minimal performance loss (5-10%) [15][16][17] - Domain-specific embeddings, such as those customized for code, offer better trade-offs between storage cost and accuracy [21] RAG Enhancement Techniques - Hybrid search, combining lexical and vector search with re-rankers, improves retrieval performance [18] - Query decomposition and document enrichment, including adding metadata and context, enhance retrieval accuracy [18][19][20] Future of RAG - The industry predicts a shift towards more sophisticated models that minimize the need for manual "tricks" to improve RAG performance [29][30] - Multimodal embeddings, which can process screenshots, PDFs, and videos, simplify workflows by eliminating the need for separate data extraction and embedding steps [32] - Context-aware and auto-chunking embeddings aim to automate the trunking process and incorporate cross-trunk information, optimizing retrieval and cost [33][36]
The State of AI Powered Search and Retrieval — Frank Liu, MongoDB (prev Voyage AI)
AI Engineer· 2025-06-27 09:57
Voyage AI & MongoDB Partnership - Voyage AI was acquired by MongoDB approximately 3-4 months ago [1] - The partnership aims to create a single data platform for embedding, re-ranking, query augmentation, and query decomposition [29][30][31] AI-Powered Search & Retrieval - AI-powered search finds related concepts beyond identical wording and understands user intent [7][8][9] - Embedding quality is a core component, with 95-99% of systems using embeddings [12] - Real-world applications include chatting with codebases, where evaluation is crucial to determine the best embedding model and LLM for the specific application [14][15] - Structured data, beyond embeddings, is often necessary for building powerful search and retrieval systems, such as filtering by state or document type in legal documents [16][17][18] - Agentic retrieval involves feedback loops where the AI search system is no longer just input-output, but can expand or decompose queries [19][20] Future Trends - The future of AI-powered search is multimodal, involving understanding images, text, and audio together [23][24][25] - Instruction tuning will allow steering vectors based on instructions, enabling more specific document retrieval [27][28]
Architecting Agent Memory: Principles, Patterns, and Best Practices — Richmond Alake, MongoDB
AI Engineer· 2025-06-27 09:56
AI Agents and Memory - The presentation focuses on the importance of memory in AI agents, emphasizing that memory is crucial for making agents reflective, interactive, proactive, reactive, and autonomous [6] - The discussion highlights different forms of memory, including short-term, long-term, conversational entity memory, knowledge data store, cache, and working memory [8] - The industry is moving towards AI agents and agentic systems, with a focus on building believable, capable, and reliable agents [1, 21] MongoDB's Role in AI Memory - MongoDB is positioned as a memory provider for agentic systems, offering features needed to turn data into memory and enhance agent capabilities [20, 21, 31] - MongoDB's flexible document data model and retrieval capabilities (graph, vector, text, geospatial query) are highlighted as key advantages for AI memory management [25] - MongoDB acquired Voyage AI to improve AI systems by reducing hallucination through better embedding models and re-rankers [32, 33] - Voyage AI's embedding models and re-rankers will be integrated into MongoDB Atlas to simplify data chunking and retrieval strategies [34] Memory Management and Implementation - Memory management involves generation, storage, retrieval, integration, updating, and forgetting mechanisms [16, 17] - Retrieval Augmented Generation (RAG) is discussed, with MongoDB providing retrieval mechanisms beyond just vector search [18] - The presentation introduces "Memoriz," an open-source library with design patterns for various memory types in AI agents [21, 22, 30] - Different memory types are explored, including persona memory, toolbox memory, conversation memory, workflow memory, episodic memory, long-term memory, and entity memory [23, 25, 26, 27, 29, 30]
Why Your Agent’s Brain Needs a Playbook: Practical Wins from Using Ontologies - Jesús Barrasa, Neo4j
AI Engineer· 2025-06-27 09:53
Knowledge Graph & LLM Application - Knowledge graphs combined with large language models (LLMs) can be used to build AI applications, particularly with graph retrieval augmented generation (RAG) architecture [2] - Graph RAG replaces vector databases with knowledge graphs built on graph databases, enhancing retrieval strategies [3] - Using a knowledge graph provides richer retrieval strategies beyond vector semantic search, including contextualization and structured queries [4] - Property graph model implements nodes and relationships, nodes represent entities and relationships connect them [4][5] Ontology & Schema - Ontologies provide an implementation-agnostic approach to representing schemas, facilitating knowledge graph creation for both structured and unstructured data pipelines [14][17] - Ontologies describe a domain with definitions of classes and relationships, matching well with graph models [15] - Financial Industry Business Ontology (FIBO) is a public financial industry ontology example [15] - Storing ontologies in the graph can drive dynamic behavior in retrievers, allowing for on-the-fly adjustments by modifying the ontology [29][30] Retrieval Strategies - Graph captures text chunks with embeddings, creating a new search space for vector search [20] - Vector search finds vectors in proximity, which can be dereferenced back to the graph for contextualization, navigation, and enrichment [20] - Dynamic queries, driven by ontologies, can be used to create dynamic retrievers, enabling data-driven behavior [26][29]
GraphRAG methods to create optimized LLM context windows for Retrieval — Jonathan Larson, Microsoft
AI Engineer· 2025-06-27 09:48
Graph RAG Applications & Performance - Graph RAG is a key enabler for building effective AI applications, especially when paired with agents [1] - Graph RAG excels at semantic understanding and can perform global queries over a code repository [2][3] - Graph RAG can be used for code translation from Python to Rust, outperforming direct LLM translation [4][9] - Graph RAG can be applied to large codebases like Doom (100,000 lines of code, 231 files) for documentation and feature development [10][12][13] - Graph RAG, when combined with GitHub Copilot coding agent, enables complex multi-file modifications, such as adding jump capability to Doom [18][20] Benchmark QED & Lazy Graph - Benchmark QED is a new open-source tool for measuring and evaluating Graph RAG systems, focusing on local and global quality metrics [21][22] - Benchmark QED includes AutoQ (query generation), AutoE (evaluation using LLM as a judge), and AutoD (dataset summarization and sampling) [22] - Lazy Graph RAG demonstrates dominant performance against vector RAG on data local questions, winning 92%, 90%, and 91% of the time against 8K, 120K, and 1 million token context windows respectively [29][30] - Lazy Graph RAG can achieve performance at a tenth of the cost compared to using a 1 million token context window [32] - Lazy Graph RAG is being incorporated into Azure AI and Microsoft Discovery Platform [34]
Agentic GraphRAG: Simplifying Retrieval Across Structured & Unstructured Data — Zach Blumenfeld
AI Engineer· 2025-06-27 09:44
Knowledge Graph Architecture & Agentic Workflows - Knowledge graphs can enhance agentic workflows by enabling reasoning and question decomposition, moving beyond simple vector searches [4] - Knowledge graphs facilitate the expression of simple data models to agents, aiding in accurate information retrieval and expansion with more data [5] - The integration of knowledge graphs allows for more precise question answering through a more expressive data model [22] Data Modeling & Entity Extraction - Data modeling should focus on defining key entities and their relationships, such as people, skills, and activities [17] - Entity extraction from unstructured documents, like resumes, can be used to create a graph database representing these relationships [18] - Pydantic classes and Langchain can be used for entity extraction workflows to decompose documents and extract JSON data containing skills and accomplishments [19][20] Benefits of Graph Databases - Graph databases enable flexible queries and high performance for complex traversals across skills, systems, domains, and accomplishments [30] - Graph databases allow for easy addition of new data and relationships, which is crucial for rapid iteration and adaptation in agentic systems [37] - Graph databases facilitate the creation of tools to find collaborators based on shared projects and domains [39] Practical Application: Employee Skills Analysis - The presentation uses an employee graph example to demonstrate skills analysis, similarity searches, and identification of skill gaps [5] - Initial attempts to answer questions using only document embeddings are inaccurate, highlighting the need for entity extraction and metadata [9] - By leveraging a knowledge graph, the system can accurately answer questions about the number of developers with specific skills, such as Python, and identify similar employees based on skill sets [24][25]
Revenue Engineering: How to Price (and Reprice) Your AI Product — Kshitij Grover, Orb
AI Engineer· 2025-06-27 09:41
Pricing Principles for AI Products - Pricing is a form of friction that can either enable or prevent product adoption, requiring careful consideration of value delivery and target audience [2] - Traditional pricing principles emphasize simplicity, value signaling through willingness to pay, and margin protection [8] - AI native pricing prioritizes predictability for mature companies needing to budget, speed for early-stage products, and adapting to variable costs [10][11][12] Key Considerations for AI Pricing - Audience understanding is crucial, considering their buying journey, value expectations, and decision-making processes [15][16] - Packaging and pricing tiers influence user perception and incentives, shaping how users interact with the product [18][19] - Margin structure should focus on axes of scaling and flexibility to experiment, rather than fixed margins due to rapidly changing underlying costs [13][14] Strategies for Margin Management and Flexibility - Differentiate through R&D innovation and pass technical advantages to users as pricing leverage [23][24] - Implement rate limits or guardrails to prevent degenerate workloads and incentivize reasonable usage, rather than linearly scaling costs [25] - Incrementally evolve pricing in response to R&D investments, aligning monetization with the perceived value by end-users [30][31] Future Trends in AI Agent Pricing - Expect continued price wars and a move towards effectively unlimited plans with caps and guardrails [36][37] - Outcome-based pricing will become more prevalent, requiring clear definitions of success and measurable SLAs [37][38] - Real-time visibility, spend management, and balance alerts will become more sophisticated, offering users greater control over spend [38][39][40]