Workflow
AI Engineer
icon
Search documents
Foundry Local: Cutting-Edge AI experiences on device with ONNX Runtime/Olive — Emma Ning, Microsoft
AI Engineer· 2025-06-27 10:21
Key Benefits of Local AI - Addresses limitations of cloud AI in low-bandwidth or offline environments, exemplified by conference Wi-Fi issues [2][3] - Enhances privacy and security by processing sensitive data locally, crucial for industries handling legal documents and patient information [4] - Improves cost efficiency for applications deployed on millions of devices with high inference call volumes, such as game applications [5] - Reduces real-time latency, essential for AI applications requiring immediate responses [5] Foundry Local Overview - Microsoft introduces Foundry Local, an optimized end-to-end solution for seamless on-device AI, leveraging existing assets like Azure AI Foundry and ONNX Runtime [9] - ONNX Runtime accelerates performance across various hardware platforms, with over 10 million downloads per month [8] - Foundry Local Management Service hosts and manages models on client devices and connects to Azure AI Foundry to download open-source models on demand [10] - Foundry Local CLI and SDK enable developers to explore models and integrate Foundry Local into applications [11] - Foundry Local is available on Windows and macOS, integrated into the Windows platform for simpler AI development [12] Performance and Customer Feedback - Foundry Local accelerates performance across different silicon vendors, including NVIDIA, Intel, AMD, and Qualcomm [12] - Early adopters report ease of use and performance improvements, highlighting benefits like enhanced memory management and faster token generation [13][15][16] - Foundry Local enables hybrid solutions, allowing parts of applications to run locally, addressing data sensitivity concerns [17][18]
[Full Workshop] Vibe Coding at Scale: Customizing AI Assistants for Enterprise Environments
AI Engineer· 2025-06-27 10:19
AI Assistant Customization in Enterprise - Systematic approaches are crucial for customizing AI assistants in complex enterprise codebases [1] - Specialized techniques are needed to navigate complex architectures [1] - Evidence-based strategies are essential for dealing with undocumented legacy systems [1] - Methodologies for maintaining context across polyglot environments are important [1] - Frameworks for standardizing AI usage while preserving developer autonomy are necessary [1] Industry Applications - Case studies from finance and healthcare demonstrate practical AI implementation [1] Evaluation Framework - A comprehensive evaluation framework bridges the gap between AI's theoretical capabilities and practical enterprise implementation [1] - The goal is to enable true flow-state collaboration in complex development ecosystems [1]
Vibe Coding at Scale: Customizing AI Assistants for Enterprise Environments - Harald Kirshner,
AI Engineer· 2025-06-27 10:15
Vibe Coding Concepts - Introduces "Vibe Coding" as a fast, creative, and iterative approach to coding, particularly useful for rapid prototyping and learning [3][4][9] - Defines three types of vibe coding: YOLO (fast, instant gratification), Structured (maintainable, balanced), and Spec-Driven (scalable, reliable) [4][6][7] - YOLO vibe coding is suitable for rapid prototyping, proof of concept, and personal projects, not for production [4][8][9] - Structured vibe coding adds guard rails for maintainability and is suitable for enterprise-level projects [5][6] - Spec-driven vibe coding scales vibe coding to large codebases with reliability [7] VS Code Features for Vibe Coding - Highlights the use of VS Code Insiders for accessing the latest features, released twice daily [1][2] - Emphasizes the use of agent mode in VS Code, along with auto-approve settings, to streamline the coding process [9][10][11] - Introduces a new workspace flow in VS Code for easier vibe coding [13][16] - Mentions the built-in voice dictation feature in VS Code for interacting with AI [11][16] - Suggests using auto-save and undo/revert options in VS Code for live updates and error correction [17][18] AI and Iteration - Encourages embracing AI to build intuition and baseline its capabilities [21] - Recommends using frameworks like React and Vite for grounding and iteration [21] - Highlights the importance of iteration, starting from scratch, and working on specific items [22] - Stresses the importance of review, committing code often, and pausing the agent to inspect [32][33] Structured Vibe Coding Details - Templates with consistent tech stacks and instructions can guide the copilot flow [23] - Custom tools and MCPs (presumably, more context providers) can provide more reliable and consistent results than YOLO mode [23][31] - Workspace instructions, prompts, and MCPs can be made dynamic for specific parts of the codebase [30] - VS Code's access to problems and tasks allows it to fix code as mistakes are made [32]
AI Red Teaming Agent: Azure AI Foundry — Nagkumar Arkalgud & Keiji Kanazawa, Microsoft
AI Engineer· 2025-06-27 10:07
AI Safety and Reliability - The industry emphasizes the importance of ensuring the safety and reliability of autonomous AI agents [1] - Azure AI Evaluation SDK's Red Teaming Agent is designed to uncover vulnerabilities in AI agents proactively [1] - The tool simulates adversarial scenarios and stress-tests agentic decision-making to ensure applications are robust, ethical, and safe [1] Risk Mitigation and Trust - Adversarial testing mitigates risks and strengthens trust in AI solutions [1] - Integrating safety checks into the development lifecycle is crucial [1] Azure AI Evaluation SDK - The SDK enables red teaming for GenAI applications [1]
Collaborating with Agents in your Software Dev Workflow - Jon Peck & Christopher Harrison, Microsoft
AI Engineer· 2025-06-27 10:05
GitHub Copilot's agentic capabilities enhance its ability to act as a peer programmer. From the IDE to the repository, Copilot can generate code, run tests, and perform tasks like creating pull requests using Model Context Protocol (MCP). This instructor-led lab will guide you through using agent capabilities on both the client and the server: Key takeaways include: Understanding how to bring agents into your software development workflow Identifying scenarios where agents can be most impactful, as well as ...
Agentic Excellence: Mastering AI Agent Evals w/ Azure AI Evaluation SDK — Cedric Vidal, Microsoft
AI Engineer· 2025-06-27 10:04
AI Agent Evaluation - Azure AI Evaluation SDK is designed to rigorously assess agentic applications, focusing on capabilities, contextual understanding, and accuracy [1] - The SDK enables the creation of evaluations using structured test plans, scenarios, and advanced analytics to identify strengths and weaknesses of AI agents [1] - Companies are leveraging the SDK to enhance agent trustworthiness, reliability, and performance in conversational agents, data-driven decision-makers, and autonomous workflow orchestrators [1] Microsoft's AI Initiatives - Microsoft is promoting AI in startups and facilitating the transition of research and startup products to the market [1] - Cedric Vidal, Principal AI Advocate at Microsoft, specializes in Generative AI and the startup and research ecosystems [1] Industry Expertise - Cedric Vidal has experience as an Engineering Manager in the AI data labeling space for the self-driving industry and as CTO of a Fintech AI SAAS startup [1] - He also has 10 years of experience as a software engineering services consultant for major Fintech enterprises [1]
How fast are LLM inference engines anyway? — Charles Frye, Modal
AI Engineer· 2025-06-27 10:01
Open Model Landscape & Benchmarking - Open-weight models are catching up to Frontier Labs in capabilities, making many AI Engineer applications possible that weren't before [1] - Open-source engines like VLM, SGLang, and Tensor TLM are readily available, reducing the need for custom model implementations [1] - Modal has created a public benchmark (modal.com/llmalmanac) for comparing the performance of different models and engines across various context lengths [2][3] Performance Analysis - Throughput is significantly higher when processing longer input contexts (prefill) compared to generating longer output sequences (decode), with up to a 4x improvement observed [15][16] - The time to first token (latency) remains nearly constant even with a 10x increase in input tokens, suggesting a "free lunch" by prioritizing context over reasoning [19] - Gemma 7B models show roughly the same throughput as Qwen 3 models, despite being 10x smaller in model weights, indicating optimization differences [12] Optimization & Infrastructure - Scaling out (adding more GPUs) is the primary method for increasing total throughput, rather than scaling up (optimizing a single GPU) [23] - Benchmarking methodology involves sending a thousand requests to determine maximum throughput and sending single requests to determine fastest possible server run time [24][25] - BF16 precision offers slower tensor core support compared to FP8 or FP4, suggesting potential for even greater performance gains with lower precision formats on newer hardware like Blackwell [16][17]
RAG in 2025: State of the Art and the Road Forward — Tengyu Ma, MongoDB (Voyage AI)
AI Engineer· 2025-06-27 09:59
Retrieval Augmented Generation (RAG) & Large Language Models (LLMs) - RAG is essential for enterprises to incorporate proprietary information into LLMs, addressing the limitations of out-of-the-box models [2][3] - RAG is considered a more reliable, faster, and cheaper approach compared to fine-tuning and long context windows for utilizing external knowledge [7] - The industry has seen significant improvements in retrieval accuracy over the past 18 months, driven by advancements in embedding models [11][12] - The industry averages approximately 80% accuracy across 100 datasets, indicating a 20% potential improvement headroom in retrieval tasks [12][13] Vector Embeddings & Storage Optimization - Techniques like matryoshka learning and quantization can reduce vector storage costs by up to 100x with minimal performance loss (5-10%) [15][16][17] - Domain-specific embeddings, such as those customized for code, offer better trade-offs between storage cost and accuracy [21] RAG Enhancement Techniques - Hybrid search, combining lexical and vector search with re-rankers, improves retrieval performance [18] - Query decomposition and document enrichment, including adding metadata and context, enhance retrieval accuracy [18][19][20] Future of RAG - The industry predicts a shift towards more sophisticated models that minimize the need for manual "tricks" to improve RAG performance [29][30] - Multimodal embeddings, which can process screenshots, PDFs, and videos, simplify workflows by eliminating the need for separate data extraction and embedding steps [32] - Context-aware and auto-chunking embeddings aim to automate the trunking process and incorporate cross-trunk information, optimizing retrieval and cost [33][36]
The State of AI Powered Search and Retrieval — Frank Liu, MongoDB (prev Voyage AI)
AI Engineer· 2025-06-27 09:57
Voyage AI & MongoDB Partnership - Voyage AI was acquired by MongoDB approximately 3-4 months ago [1] - The partnership aims to create a single data platform for embedding, re-ranking, query augmentation, and query decomposition [29][30][31] AI-Powered Search & Retrieval - AI-powered search finds related concepts beyond identical wording and understands user intent [7][8][9] - Embedding quality is a core component, with 95-99% of systems using embeddings [12] - Real-world applications include chatting with codebases, where evaluation is crucial to determine the best embedding model and LLM for the specific application [14][15] - Structured data, beyond embeddings, is often necessary for building powerful search and retrieval systems, such as filtering by state or document type in legal documents [16][17][18] - Agentic retrieval involves feedback loops where the AI search system is no longer just input-output, but can expand or decompose queries [19][20] Future Trends - The future of AI-powered search is multimodal, involving understanding images, text, and audio together [23][24][25] - Instruction tuning will allow steering vectors based on instructions, enabling more specific document retrieval [27][28]
Architecting Agent Memory: Principles, Patterns, and Best Practices — Richmond Alake, MongoDB
AI Engineer· 2025-06-27 09:56
AI Agents and Memory - The presentation focuses on the importance of memory in AI agents, emphasizing that memory is crucial for making agents reflective, interactive, proactive, reactive, and autonomous [6] - The discussion highlights different forms of memory, including short-term, long-term, conversational entity memory, knowledge data store, cache, and working memory [8] - The industry is moving towards AI agents and agentic systems, with a focus on building believable, capable, and reliable agents [1, 21] MongoDB's Role in AI Memory - MongoDB is positioned as a memory provider for agentic systems, offering features needed to turn data into memory and enhance agent capabilities [20, 21, 31] - MongoDB's flexible document data model and retrieval capabilities (graph, vector, text, geospatial query) are highlighted as key advantages for AI memory management [25] - MongoDB acquired Voyage AI to improve AI systems by reducing hallucination through better embedding models and re-rankers [32, 33] - Voyage AI's embedding models and re-rankers will be integrated into MongoDB Atlas to simplify data chunking and retrieval strategies [34] Memory Management and Implementation - Memory management involves generation, storage, retrieval, integration, updating, and forgetting mechanisms [16, 17] - Retrieval Augmented Generation (RAG) is discussed, with MongoDB providing retrieval mechanisms beyond just vector search [18] - The presentation introduces "Memoriz," an open-source library with design patterns for various memory types in AI agents [21, 22, 30] - Different memory types are explored, including persona memory, toolbox memory, conversation memory, workflow memory, episodic memory, long-term memory, and entity memory [23, 25, 26, 27, 29, 30]