AI Engineer

Search documents
AI Red Teaming Agent: Azure AI Foundry — Nagkumar Arkalgud & Keiji Kanazawa, Microsoft
AI Engineer· 2025-06-27 10:07
AI Safety and Reliability - The industry emphasizes the importance of ensuring the safety and reliability of autonomous AI agents [1] - Azure AI Evaluation SDK's Red Teaming Agent is designed to uncover vulnerabilities in AI agents proactively [1] - The tool simulates adversarial scenarios and stress-tests agentic decision-making to ensure applications are robust, ethical, and safe [1] Risk Mitigation and Trust - Adversarial testing mitigates risks and strengthens trust in AI solutions [1] - Integrating safety checks into the development lifecycle is crucial [1] Azure AI Evaluation SDK - The SDK enables red teaming for GenAI applications [1]
Collaborating with Agents in your Software Dev Workflow - Jon Peck & Christopher Harrison, Microsoft
AI Engineer· 2025-06-27 10:05
GitHub Copilot's agentic capabilities enhance its ability to act as a peer programmer. From the IDE to the repository, Copilot can generate code, run tests, and perform tasks like creating pull requests using Model Context Protocol (MCP). This instructor-led lab will guide you through using agent capabilities on both the client and the server: Key takeaways include: Understanding how to bring agents into your software development workflow Identifying scenarios where agents can be most impactful, as well as ...
Agentic Excellence: Mastering AI Agent Evals w/ Azure AI Evaluation SDK — Cedric Vidal, Microsoft
AI Engineer· 2025-06-27 10:04
AI Agent Evaluation - Azure AI Evaluation SDK is designed to rigorously assess agentic applications, focusing on capabilities, contextual understanding, and accuracy [1] - The SDK enables the creation of evaluations using structured test plans, scenarios, and advanced analytics to identify strengths and weaknesses of AI agents [1] - Companies are leveraging the SDK to enhance agent trustworthiness, reliability, and performance in conversational agents, data-driven decision-makers, and autonomous workflow orchestrators [1] Microsoft's AI Initiatives - Microsoft is promoting AI in startups and facilitating the transition of research and startup products to the market [1] - Cedric Vidal, Principal AI Advocate at Microsoft, specializes in Generative AI and the startup and research ecosystems [1] Industry Expertise - Cedric Vidal has experience as an Engineering Manager in the AI data labeling space for the self-driving industry and as CTO of a Fintech AI SAAS startup [1] - He also has 10 years of experience as a software engineering services consultant for major Fintech enterprises [1]
How fast are LLM inference engines anyway? — Charles Frye, Modal
AI Engineer· 2025-06-27 10:01
Open Model Landscape & Benchmarking - Open-weight models are catching up to Frontier Labs in capabilities, making many AI Engineer applications possible that weren't before [1] - Open-source engines like VLM, SGLang, and Tensor TLM are readily available, reducing the need for custom model implementations [1] - Modal has created a public benchmark (modal.com/llmalmanac) for comparing the performance of different models and engines across various context lengths [2][3] Performance Analysis - Throughput is significantly higher when processing longer input contexts (prefill) compared to generating longer output sequences (decode), with up to a 4x improvement observed [15][16] - The time to first token (latency) remains nearly constant even with a 10x increase in input tokens, suggesting a "free lunch" by prioritizing context over reasoning [19] - Gemma 7B models show roughly the same throughput as Qwen 3 models, despite being 10x smaller in model weights, indicating optimization differences [12] Optimization & Infrastructure - Scaling out (adding more GPUs) is the primary method for increasing total throughput, rather than scaling up (optimizing a single GPU) [23] - Benchmarking methodology involves sending a thousand requests to determine maximum throughput and sending single requests to determine fastest possible server run time [24][25] - BF16 precision offers slower tensor core support compared to FP8 or FP4, suggesting potential for even greater performance gains with lower precision formats on newer hardware like Blackwell [16][17]
RAG in 2025: State of the Art and the Road Forward — Tengyu Ma, MongoDB (Voyage AI)
AI Engineer· 2025-06-27 09:59
Retrieval Augmented Generation (RAG) & Large Language Models (LLMs) - RAG is essential for enterprises to incorporate proprietary information into LLMs, addressing the limitations of out-of-the-box models [2][3] - RAG is considered a more reliable, faster, and cheaper approach compared to fine-tuning and long context windows for utilizing external knowledge [7] - The industry has seen significant improvements in retrieval accuracy over the past 18 months, driven by advancements in embedding models [11][12] - The industry averages approximately 80% accuracy across 100 datasets, indicating a 20% potential improvement headroom in retrieval tasks [12][13] Vector Embeddings & Storage Optimization - Techniques like matryoshka learning and quantization can reduce vector storage costs by up to 100x with minimal performance loss (5-10%) [15][16][17] - Domain-specific embeddings, such as those customized for code, offer better trade-offs between storage cost and accuracy [21] RAG Enhancement Techniques - Hybrid search, combining lexical and vector search with re-rankers, improves retrieval performance [18] - Query decomposition and document enrichment, including adding metadata and context, enhance retrieval accuracy [18][19][20] Future of RAG - The industry predicts a shift towards more sophisticated models that minimize the need for manual "tricks" to improve RAG performance [29][30] - Multimodal embeddings, which can process screenshots, PDFs, and videos, simplify workflows by eliminating the need for separate data extraction and embedding steps [32] - Context-aware and auto-chunking embeddings aim to automate the trunking process and incorporate cross-trunk information, optimizing retrieval and cost [33][36]
The State of AI Powered Search and Retrieval — Frank Liu, MongoDB (prev Voyage AI)
AI Engineer· 2025-06-27 09:57
[Music] Welcome everybody. Uh I want to thank you for coming to this session today. Um today I want to talk about AI powered search and retrieval.Uh and for those of you who don't know me, my name is Frank. Uh I am actually a part of the Voyage AI team and we recently joined MongoDB. I want to say probably about three to four months ago.Uh just a quick introduction to Voyage AI. You know, we build the most accurate, cost-effective embedding models and rerankers for rag and semantic search. Uh a lot of the a ...
Architecting Agent Memory: Principles, Patterns, and Best Practices — Richmond Alake, MongoDB
AI Engineer· 2025-06-27 09:56
AI Agents and Memory - The presentation focuses on the importance of memory in AI agents, emphasizing that memory is crucial for making agents reflective, interactive, proactive, reactive, and autonomous [6] - The discussion highlights different forms of memory, including short-term, long-term, conversational entity memory, knowledge data store, cache, and working memory [8] - The industry is moving towards AI agents and agentic systems, with a focus on building believable, capable, and reliable agents [1, 21] MongoDB's Role in AI Memory - MongoDB is positioned as a memory provider for agentic systems, offering features needed to turn data into memory and enhance agent capabilities [20, 21, 31] - MongoDB's flexible document data model and retrieval capabilities (graph, vector, text, geospatial query) are highlighted as key advantages for AI memory management [25] - MongoDB acquired Voyage AI to improve AI systems by reducing hallucination through better embedding models and re-rankers [32, 33] - Voyage AI's embedding models and re-rankers will be integrated into MongoDB Atlas to simplify data chunking and retrieval strategies [34] Memory Management and Implementation - Memory management involves generation, storage, retrieval, integration, updating, and forgetting mechanisms [16, 17] - Retrieval Augmented Generation (RAG) is discussed, with MongoDB providing retrieval mechanisms beyond just vector search [18] - The presentation introduces "Memoriz," an open-source library with design patterns for various memory types in AI agents [21, 22, 30] - Different memory types are explored, including persona memory, toolbox memory, conversation memory, workflow memory, episodic memory, long-term memory, and entity memory [23, 25, 26, 27, 29, 30]
Why Your Agent’s Brain Needs a Playbook: Practical Wins from Using Ontologies - Jesús Barrasa, Neo4j
AI Engineer· 2025-06-27 09:53
[Music] welcome. Thank you for joining my session this morning. I was thinking that I was in some kind of niche space where people would not be interested, but I'm curious to hear, you know, what brought you to to this session.Anyway, I'm I'm Jesus Barasa. I'm the I'm the field CTO for AI with NEOJ and uh and yeah, for the next like uh 14ish minutes, I'm going to be diving a bit deeper into this uh really powerful combination which is knowledge graphs and uh and large language models to build AI application ...
GraphRAG methods to create optimized LLM context windows for Retrieval — Jonathan Larson, Microsoft
AI Engineer· 2025-06-27 09:48
[Music] Um well hello everyone um thanks for coming to the session here today. My name is Jonathan Larson and uh I run the graph uh team at Microsoft Research. Um, some of you may have seen our paper that we actually released last year uh the graphite paper or you might have seen our GitHub repo uh that was out there and uh has a lot of stars on it. Um when we released this last year to our surprise it got a lot of attention and also it inspired many other offerings that we saw that were out there as well t ...
Agentic GraphRAG: Simplifying Retrieval Across Structured & Unstructured Data — Zach Blumenfeld
AI Engineer· 2025-06-27 09:44
[Music] I'm going to go over today graph rag particularly dealing with multiple data sources. Uh so both unstructured and structured data sources and kind of why you would want to ever do that in the first place even. So I prepared uh some notebooks here.I was going to make slides, but then I thought it would just be easier to walk through some of what this looks like in practice. Um, so there's a link here, and I can share it with you at the booth later, um, if you have follow-up questions. But basically, ...