AI Engineer - filings, earnings calls, financial reports, news

AI Engineer

Search documents

Collaborating with Agents in your Software Dev Workflow - Jon Peck & Christopher Harrison, Microsoft

AI Engineer· 2025-06-27 10:05

GitHub Copilot's agentic capabilities enhance its ability to act as a peer programmer. From the IDE to the repository, Copilot can generate code, run tests, and perform tasks like creating pull requests using Model Context Protocol (MCP). This instructor-led lab will guide you through using agent capabilities on both the client and the server: Key takeaways include: Understanding how to bring agents into your software development workflow Identifying scenarios where agents can be most impactful, as well as ...

Artificial Intelligence

GitHub Copilot

Artificial Intelligence

GitHub Copilot

Agentic Excellence: Mastering AI Agent Evals w/ Azure AI Evaluation SDK — Cedric Vidal, Microsoft

AI Engineer· 2025-06-27 10:04

AI Agent Evaluation - Azure AI Evaluation SDK is designed to rigorously assess agentic applications, focusing on capabilities, contextual understanding, and accuracy [1] - The SDK enables the creation of evaluations using structured test plans, scenarios, and advanced analytics to identify strengths and weaknesses of AI agents [1] - Companies are leveraging the SDK to enhance agent trustworthiness, reliability, and performance in conversational agents, data-driven decision-makers, and autonomous workflow orchestrators [1] Microsoft's AI Initiatives - Microsoft is promoting AI in startups and facilitating the transition of research and startup products to the market [1] - Cedric Vidal, Principal AI Advocate at Microsoft, specializes in Generative AI and the startup and research ecosystems [1] Industry Expertise - Cedric Vidal has experience as an Engineering Manager in the AI data labeling space for the self-driving industry and as CTO of a Fintech AI SAAS startup [1] - He also has 10 years of experience as a software engineering services consultant for major Fintech enterprises [1]

AI agents

AI Agent Evals

Generative AI

Azure AI Evaluation SDK

AI agents

AI Agent Evals

Generative AI

Azure AI Evaluation SDK

How fast are LLM inference engines anyway? — Charles Frye, Modal

AI Engineer· 2025-06-27 10:01

Open Model Landscape & Benchmarking - Open-weight models are catching up to Frontier Labs in capabilities, making many AI Engineer applications possible that weren't before [1] - Open-source engines like VLM, SGLang, and Tensor TLM are readily available, reducing the need for custom model implementations [1] - Modal has created a public benchmark (modal.com/llmalmanac) for comparing the performance of different models and engines across various context lengths [2][3] Performance Analysis - Throughput is significantly higher when processing longer input contexts (prefill) compared to generating longer output sequences (decode), with up to a 4x improvement observed [15][16] - The time to first token (latency) remains nearly constant even with a 10x increase in input tokens, suggesting a "free lunch" by prioritizing context over reasoning [19] - Gemma 7B models show roughly the same throughput as Qwen 3 models, despite being 10x smaller in model weights, indicating optimization differences [12] Optimization & Infrastructure - Scaling out (adding more GPUs) is the primary method for increasing total throughput, rather than scaling up (optimizing a single GPU) [23] - Benchmarking methodology involves sending a thousand requests to determine maximum throughput and sending single requests to determine fastest possible server run time [24][25] - BF16 precision offers slower tensor core support compared to FP8 or FP4, suggesting potential for even greater performance gains with lower precision formats on newer hardware like Blackwell [16][17]

LLM inference engines

LLM inference engines

RAG in 2025: State of the Art and the Road Forward — Tengyu Ma, MongoDB (Voyage AI)

AI Engineer· 2025-06-27 09:59

Retrieval Augmented Generation (RAG) & Large Language Models (LLMs) - RAG is essential for enterprises to incorporate proprietary information into LLMs, addressing the limitations of out-of-the-box models [2][3] - RAG is considered a more reliable, faster, and cheaper approach compared to fine-tuning and long context windows for utilizing external knowledge [7] - The industry has seen significant improvements in retrieval accuracy over the past 18 months, driven by advancements in embedding models [11][12] - The industry averages approximately 80% accuracy across 100 datasets, indicating a 20% potential improvement headroom in retrieval tasks [12][13] Vector Embeddings & Storage Optimization - Techniques like matryoshka learning and quantization can reduce vector storage costs by up to 100x with minimal performance loss (5-10%) [15][16][17] - Domain-specific embeddings, such as those customized for code, offer better trade-offs between storage cost and accuracy [21] RAG Enhancement Techniques - Hybrid search, combining lexical and vector search with re-rankers, improves retrieval performance [18] - Query decomposition and document enrichment, including adding metadata and context, enhance retrieval accuracy [18][19][20] Future of RAG - The industry predicts a shift towards more sophisticated models that minimize the need for manual "tricks" to improve RAG performance [29][30] - Multimodal embeddings, which can process screenshots, PDFs, and videos, simplify workflows by eliminating the need for separate data extraction and embedding steps [32] - Context-aware and auto-chunking embeddings aim to automate the trunking process and incorporate cross-trunk information, optimizing retrieval and cost [33][36]

RAG (Retrieval-Augmented Generation)

Multimodal Embedding

Context-aware Auto Chunking

Large Language Models (LLMs)

Embedding Models

RAG (Retrieval-Augmented Generation)

Multimodal Embedding

Context-aware Auto Chunking

Large Language Models (LLMs)

Embedding Models

The State of AI Powered Search and Retrieval — Frank Liu, MongoDB (prev Voyage AI)

AI Engineer· 2025-06-27 09:57

Voyage AI & MongoDB Partnership - Voyage AI was acquired by MongoDB approximately 3-4 months ago [1] - The partnership aims to create a single data platform for embedding, re-ranking, query augmentation, and query decomposition [29][30][31] AI-Powered Search & Retrieval - AI-powered search finds related concepts beyond identical wording and understands user intent [7][8][9] - Embedding quality is a core component, with 95-99% of systems using embeddings [12] - Real-world applications include chatting with codebases, where evaluation is crucial to determine the best embedding model and LLM for the specific application [14][15] - Structured data, beyond embeddings, is often necessary for building powerful search and retrieval systems, such as filtering by state or document type in legal documents [16][17][18] - Agentic retrieval involves feedback loops where the AI search system is no longer just input-output, but can expand or decompose queries [19][20] Future Trends - The future of AI-powered search is multimodal, involving understanding images, text, and audio together [23][24][25] - Instruction tuning will allow steering vectors based on instructions, enabling more specific document retrieval [27][28]

Architecting Agent Memory: Principles, Patterns, and Best Practices — Richmond Alake, MongoDB

AI Engineer· 2025-06-27 09:56

AI Agents and Memory - The presentation focuses on the importance of memory in AI agents, emphasizing that memory is crucial for making agents reflective, interactive, proactive, reactive, and autonomous [6] - The discussion highlights different forms of memory, including short-term, long-term, conversational entity memory, knowledge data store, cache, and working memory [8] - The industry is moving towards AI agents and agentic systems, with a focus on building believable, capable, and reliable agents [1, 21] MongoDB's Role in AI Memory - MongoDB is positioned as a memory provider for agentic systems, offering features needed to turn data into memory and enhance agent capabilities [20, 21, 31] - MongoDB's flexible document data model and retrieval capabilities (graph, vector, text, geospatial query) are highlighted as key advantages for AI memory management [25] - MongoDB acquired Voyage AI to improve AI systems by reducing hallucination through better embedding models and re-rankers [32, 33] - Voyage AI's embedding models and re-rankers will be integrated into MongoDB Atlas to simplify data chunking and retrieval strategies [34] Memory Management and Implementation - Memory management involves generation, storage, retrieval, integration, updating, and forgetting mechanisms [16, 17] - Retrieval Augmented Generation (RAG) is discussed, with MongoDB providing retrieval mechanisms beyond just vector search [18] - The presentation introduces "Memoriz," an open-source library with design patterns for various memory types in AI agents [21, 22, 30] - Different memory types are explored, including persona memory, toolbox memory, conversation memory, workflow memory, episodic memory, long-term memory, and entity memory [23, 25, 26, 27, 29, 30]

Why Your Agent’s Brain Needs a Playbook: Practical Wins from Using Ontologies - Jesús Barrasa, Neo4j

AI Engineer· 2025-06-27 09:53

Knowledge Graph & LLM Application - Knowledge graphs combined with large language models (LLMs) can be used to build AI applications, particularly with graph retrieval augmented generation (RAG) architecture [2] - Graph RAG replaces vector databases with knowledge graphs built on graph databases, enhancing retrieval strategies [3] - Using a knowledge graph provides richer retrieval strategies beyond vector semantic search, including contextualization and structured queries [4] - Property graph model implements nodes and relationships, nodes represent entities and relationships connect them [4][5] Ontology & Schema - Ontologies provide an implementation-agnostic approach to representing schemas, facilitating knowledge graph creation for both structured and unstructured data pipelines [14][17] - Ontologies describe a domain with definitions of classes and relationships, matching well with graph models [15] - Financial Industry Business Ontology (FIBO) is a public financial industry ontology example [15] - Storing ontologies in the graph can drive dynamic behavior in retrievers, allowing for on-the-fly adjustments by modifying the ontology [29][30] Retrieval Strategies - Graph captures text chunks with embeddings, creating a new search space for vector search [20] - Vector search finds vectors in proximity, which can be dereferenced back to the graph for contextualization, navigation, and enrichment [20] - Dynamic queries, driven by ontologies, can be used to create dynamic retrievers, enabling data-driven behavior [26][29]

Graph RAG

Ontologies

Knowledge Graphs

Large Language Models

Large Language Models

AI applications

GraphRAG methods to create optimized LLM context windows for Retrieval — Jonathan Larson, Microsoft

AI Engineer· 2025-06-27 09:48

Graph RAG Applications & Performance - Graph RAG is a key enabler for building effective AI applications, especially when paired with agents [1] - Graph RAG excels at semantic understanding and can perform global queries over a code repository [2][3] - Graph RAG can be used for code translation from Python to Rust, outperforming direct LLM translation [4][9] - Graph RAG can be applied to large codebases like Doom (100,000 lines of code, 231 files) for documentation and feature development [10][12][13] - Graph RAG, when combined with GitHub Copilot coding agent, enables complex multi-file modifications, such as adding jump capability to Doom [18][20] Benchmark QED & Lazy Graph - Benchmark QED is a new open-source tool for measuring and evaluating Graph RAG systems, focusing on local and global quality metrics [21][22] - Benchmark QED includes AutoQ (query generation), AutoE (evaluation using LLM as a judge), and AutoD (dataset summarization and sampling) [22] - Lazy Graph RAG demonstrates dominant performance against vector RAG on data local questions, winning 92%, 90%, and 91% of the time against 8K, 120K, and 1 million token context windows respectively [29][30] - Lazy Graph RAG can achieve performance at a tenth of the cost compared to using a 1 million token context window [32] - Lazy Graph RAG is being incorporated into Azure AI and Microsoft Discovery Platform [34]

Artificial Intelligence

Artificial Intelligence

Agentic GraphRAG: Simplifying Retrieval Across Structured & Unstructured Data — Zach Blumenfeld

AI Engineer· 2025-06-27 09:44

Knowledge Graph Architecture & Agentic Workflows - Knowledge graphs can enhance agentic workflows by enabling reasoning and question decomposition, moving beyond simple vector searches [4] - Knowledge graphs facilitate the expression of simple data models to agents, aiding in accurate information retrieval and expansion with more data [5] - The integration of knowledge graphs allows for more precise question answering through a more expressive data model [22] Data Modeling & Entity Extraction - Data modeling should focus on defining key entities and their relationships, such as people, skills, and activities [17] - Entity extraction from unstructured documents, like resumes, can be used to create a graph database representing these relationships [18] - Pydantic classes and Langchain can be used for entity extraction workflows to decompose documents and extract JSON data containing skills and accomplishments [19][20] Benefits of Graph Databases - Graph databases enable flexible queries and high performance for complex traversals across skills, systems, domains, and accomplishments [30] - Graph databases allow for easy addition of new data and relationships, which is crucial for rapid iteration and adaptation in agentic systems [37] - Graph databases facilitate the creation of tools to find collaborators based on shared projects and domains [39] Practical Application: Employee Skills Analysis - The presentation uses an employee graph example to demonstrate skills analysis, similarity searches, and identification of skill gaps [5] - Initial attempts to answer questions using only document embeddings are inaccurate, highlighting the need for entity extraction and metadata [9] - By leveraging a knowledge graph, the system can accurately answer questions about the number of developers with specific skills, such as Python, and identify similar employees based on skill sets [24][25]

Revenue Engineering: How to Price (and Reprice) Your AI Product — Kshitij Grover, Orb

AI Engineer· 2025-06-27 09:41

Pricing Principles for AI Products - Pricing is a form of friction that can either enable or prevent product adoption, requiring careful consideration of value delivery and target audience [2] - Traditional pricing principles emphasize simplicity, value signaling through willingness to pay, and margin protection [8] - AI native pricing prioritizes predictability for mature companies needing to budget, speed for early-stage products, and adapting to variable costs [10][11][12] Key Considerations for AI Pricing - Audience understanding is crucial, considering their buying journey, value expectations, and decision-making processes [15][16] - Packaging and pricing tiers influence user perception and incentives, shaping how users interact with the product [18][19] - Margin structure should focus on axes of scaling and flexibility to experiment, rather than fixed margins due to rapidly changing underlying costs [13][14] Strategies for Margin Management and Flexibility - Differentiate through R&D innovation and pass technical advantages to users as pricing leverage [23][24] - Implement rate limits or guardrails to prevent degenerate workloads and incentivize reasonable usage, rather than linearly scaling costs [25] - Incrementally evolve pricing in response to R&D investments, aligning monetization with the perceived value by end-users [30][31] Future Trends in AI Agent Pricing - Expect continued price wars and a move towards effectively unlimited plans with caps and guardrails [36][37] - Outcome-based pricing will become more prevalent, requiring clear definitions of success and measurable SLAs [37][38] - Real-time visibility, spend management, and balance alerts will become more sophisticated, offering users greater control over spend [38][39][40]