AI Engineer - filings, earnings calls, financial reports, news

AI Engineer

Search documents

Foundry Local: Cutting-Edge AI experiences on device with ONNX Runtime/Olive — Emma Ning, Microsoft

AI Engineer· 2025-06-27 10:21

Key Benefits of Local AI - Addresses limitations of cloud AI in low-bandwidth or offline environments, exemplified by conference Wi-Fi issues [2][3] - Enhances privacy and security by processing sensitive data locally, crucial for industries handling legal documents and patient information [4] - Improves cost efficiency for applications deployed on millions of devices with high inference call volumes, such as game applications [5] - Reduces real-time latency, essential for AI applications requiring immediate responses [5] Foundry Local Overview - Microsoft introduces Foundry Local, an optimized end-to-end solution for seamless on-device AI, leveraging existing assets like Azure AI Foundry and ONNX Runtime [9] - ONNX Runtime accelerates performance across various hardware platforms, with over 10 million downloads per month [8] - Foundry Local Management Service hosts and manages models on client devices and connects to Azure AI Foundry to download open-source models on demand [10] - Foundry Local CLI and SDK enable developers to explore models and integrate Foundry Local into applications [11] - Foundry Local is available on Windows and macOS, integrated into the Windows platform for simpler AI development [12] Performance and Customer Feedback - Foundry Local accelerates performance across different silicon vendors, including NVIDIA, Intel, AMD, and Qualcomm [12] - Early adopters report ease of use and performance improvements, highlighting benefits like enhanced memory management and faster token generation [13][15][16] - Foundry Local enables hybrid solutions, allowing parts of applications to run locally, addressing data sensitivity concerns [17][18]

Microsoft(US:MSFT)

Local AI

Cross-platform applications

Cross-platform applications

Foundry Local

ONNX Runtime

Olive

[Full Workshop] Vibe Coding at Scale: Customizing AI Assistants for Enterprise Environments

AI Engineer· 2025-06-27 10:19

AI Assistant Customization in Enterprise - Systematic approaches are crucial for customizing AI assistants in complex enterprise codebases [1] - Specialized techniques are needed to navigate complex architectures [1] - Evidence-based strategies are essential for dealing with undocumented legacy systems [1] - Methodologies for maintaining context across polyglot environments are important [1] - Frameworks for standardizing AI usage while preserving developer autonomy are necessary [1] Industry Applications - Case studies from finance and healthcare demonstrate practical AI implementation [1] Evaluation Framework - A comprehensive evaluation framework bridges the gap between AI's theoretical capabilities and practical enterprise implementation [1] - The goal is to enable true flow-state collaboration in complex development ecosystems [1]

Artificial Intelligence

AI Assistants

Enterprise Environments

Artificial Intelligence

AI Assistants

Enterprise Environments

Vibe Coding at Scale: Customizing AI Assistants for Enterprise Environments - Harald Kirshner,

AI Engineer· 2025-06-27 10:15

Vibe Coding Concepts - Introduces "Vibe Coding" as a fast, creative, and iterative approach to coding, particularly useful for rapid prototyping and learning [3][4][9] - Defines three types of vibe coding: YOLO (fast, instant gratification), Structured (maintainable, balanced), and Spec-Driven (scalable, reliable) [4][6][7] - YOLO vibe coding is suitable for rapid prototyping, proof of concept, and personal projects, not for production [4][8][9] - Structured vibe coding adds guard rails for maintainability and is suitable for enterprise-level projects [5][6] - Spec-driven vibe coding scales vibe coding to large codebases with reliability [7] VS Code Features for Vibe Coding - Highlights the use of VS Code Insiders for accessing the latest features, released twice daily [1][2] - Emphasizes the use of agent mode in VS Code, along with auto-approve settings, to streamline the coding process [9][10][11] - Introduces a new workspace flow in VS Code for easier vibe coding [13][16] - Mentions the built-in voice dictation feature in VS Code for interacting with AI [11][16] - Suggests using auto-save and undo/revert options in VS Code for live updates and error correction [17][18] AI and Iteration - Encourages embracing AI to build intuition and baseline its capabilities [21] - Recommends using frameworks like React and Vite for grounding and iteration [21] - Highlights the importance of iteration, starting from scratch, and working on specific items [22] - Stresses the importance of review, committing code often, and pausing the agent to inspect [32][33] Structured Vibe Coding Details - Templates with consistent tech stacks and instructions can guide the copilot flow [23] - Custom tools and MCPs (presumably, more context providers) can provide more reliable and consistent results than YOLO mode [23][31] - Workspace instructions, prompts, and MCPs can be made dynamic for specific parts of the codebase [30] - VS Code's access to problems and tasks allows it to fix code as mistakes are made [32]

AI Red Teaming Agent: Azure AI Foundry — Nagkumar Arkalgud & Keiji Kanazawa, Microsoft

AI Engineer· 2025-06-27 10:07

AI Safety and Reliability - The industry emphasizes the importance of ensuring the safety and reliability of autonomous AI agents [1] - Azure AI Evaluation SDK's Red Teaming Agent is designed to uncover vulnerabilities in AI agents proactively [1] - The tool simulates adversarial scenarios and stress-tests agentic decision-making to ensure applications are robust, ethical, and safe [1] Risk Mitigation and Trust - Adversarial testing mitigates risks and strengthens trust in AI solutions [1] - Integrating safety checks into the development lifecycle is crucial [1] Azure AI Evaluation SDK - The SDK enables red teaming for GenAI applications [1]

Microsoft(US:MSFT)

AI Red Teaming

Autonomous AI agents

Adversarial scenarios

Azure AI Evaluation SDK

AI Red Teaming

Autonomous AI agents

Adversarial scenarios

Azure AI Evaluation SDK

Collaborating with Agents in your Software Dev Workflow - Jon Peck & Christopher Harrison, Microsoft

AI Engineer· 2025-06-27 10:05

GitHub Copilot's agentic capabilities enhance its ability to act as a peer programmer. From the IDE to the repository, Copilot can generate code, run tests, and perform tasks like creating pull requests using Model Context Protocol (MCP). This instructor-led lab will guide you through using agent capabilities on both the client and the server: Key takeaways include: Understanding how to bring agents into your software development workflow Identifying scenarios where agents can be most impactful, as well as ...

Artificial Intelligence

GitHub Copilot

Artificial Intelligence

GitHub Copilot

Agentic Excellence: Mastering AI Agent Evals w/ Azure AI Evaluation SDK — Cedric Vidal, Microsoft

AI Engineer· 2025-06-27 10:04

AI Agent Evaluation - Azure AI Evaluation SDK is designed to rigorously assess agentic applications, focusing on capabilities, contextual understanding, and accuracy [1] - The SDK enables the creation of evaluations using structured test plans, scenarios, and advanced analytics to identify strengths and weaknesses of AI agents [1] - Companies are leveraging the SDK to enhance agent trustworthiness, reliability, and performance in conversational agents, data-driven decision-makers, and autonomous workflow orchestrators [1] Microsoft's AI Initiatives - Microsoft is promoting AI in startups and facilitating the transition of research and startup products to the market [1] - Cedric Vidal, Principal AI Advocate at Microsoft, specializes in Generative AI and the startup and research ecosystems [1] Industry Expertise - Cedric Vidal has experience as an Engineering Manager in the AI data labeling space for the self-driving industry and as CTO of a Fintech AI SAAS startup [1] - He also has 10 years of experience as a software engineering services consultant for major Fintech enterprises [1]

AI agents

AI Agent Evals

Generative AI

Azure AI Evaluation SDK

AI agents

AI Agent Evals

Generative AI

Azure AI Evaluation SDK

How fast are LLM inference engines anyway? — Charles Frye, Modal

AI Engineer· 2025-06-27 10:01

Open Model Landscape & Benchmarking - Open-weight models are catching up to Frontier Labs in capabilities, making many AI Engineer applications possible that weren't before [1] - Open-source engines like VLM, SGLang, and Tensor TLM are readily available, reducing the need for custom model implementations [1] - Modal has created a public benchmark (modal.com/llmalmanac) for comparing the performance of different models and engines across various context lengths [2][3] Performance Analysis - Throughput is significantly higher when processing longer input contexts (prefill) compared to generating longer output sequences (decode), with up to a 4x improvement observed [15][16] - The time to first token (latency) remains nearly constant even with a 10x increase in input tokens, suggesting a "free lunch" by prioritizing context over reasoning [19] - Gemma 7B models show roughly the same throughput as Qwen 3 models, despite being 10x smaller in model weights, indicating optimization differences [12] Optimization & Infrastructure - Scaling out (adding more GPUs) is the primary method for increasing total throughput, rather than scaling up (optimizing a single GPU) [23] - Benchmarking methodology involves sending a thousand requests to determine maximum throughput and sending single requests to determine fastest possible server run time [24][25] - BF16 precision offers slower tensor core support compared to FP8 or FP4, suggesting potential for even greater performance gains with lower precision formats on newer hardware like Blackwell [16][17]

LLM inference engines

LLM inference engines

RAG in 2025: State of the Art and the Road Forward — Tengyu Ma, MongoDB (Voyage AI)

AI Engineer· 2025-06-27 09:59

Retrieval Augmented Generation (RAG) & Large Language Models (LLMs) - RAG is essential for enterprises to incorporate proprietary information into LLMs, addressing the limitations of out-of-the-box models [2][3] - RAG is considered a more reliable, faster, and cheaper approach compared to fine-tuning and long context windows for utilizing external knowledge [7] - The industry has seen significant improvements in retrieval accuracy over the past 18 months, driven by advancements in embedding models [11][12] - The industry averages approximately 80% accuracy across 100 datasets, indicating a 20% potential improvement headroom in retrieval tasks [12][13] Vector Embeddings & Storage Optimization - Techniques like matryoshka learning and quantization can reduce vector storage costs by up to 100x with minimal performance loss (5-10%) [15][16][17] - Domain-specific embeddings, such as those customized for code, offer better trade-offs between storage cost and accuracy [21] RAG Enhancement Techniques - Hybrid search, combining lexical and vector search with re-rankers, improves retrieval performance [18] - Query decomposition and document enrichment, including adding metadata and context, enhance retrieval accuracy [18][19][20] Future of RAG - The industry predicts a shift towards more sophisticated models that minimize the need for manual "tricks" to improve RAG performance [29][30] - Multimodal embeddings, which can process screenshots, PDFs, and videos, simplify workflows by eliminating the need for separate data extraction and embedding steps [32] - Context-aware and auto-chunking embeddings aim to automate the trunking process and incorporate cross-trunk information, optimizing retrieval and cost [33][36]

RAG (Retrieval-Augmented Generation)

Multimodal Embedding

Context-aware Auto Chunking

Large Language Models (LLMs)

Embedding Models

RAG (Retrieval-Augmented Generation)

Multimodal Embedding

Context-aware Auto Chunking

Large Language Models (LLMs)

Embedding Models

The State of AI Powered Search and Retrieval — Frank Liu, MongoDB (prev Voyage AI)

AI Engineer· 2025-06-27 09:57

Voyage AI & MongoDB Partnership - Voyage AI was acquired by MongoDB approximately 3-4 months ago [1] - The partnership aims to create a single data platform for embedding, re-ranking, query augmentation, and query decomposition [29][30][31] AI-Powered Search & Retrieval - AI-powered search finds related concepts beyond identical wording and understands user intent [7][8][9] - Embedding quality is a core component, with 95-99% of systems using embeddings [12] - Real-world applications include chatting with codebases, where evaluation is crucial to determine the best embedding model and LLM for the specific application [14][15] - Structured data, beyond embeddings, is often necessary for building powerful search and retrieval systems, such as filtering by state or document type in legal documents [16][17][18] - Agentic retrieval involves feedback loops where the AI search system is no longer just input-output, but can expand or decompose queries [19][20] Future Trends - The future of AI-powered search is multimodal, involving understanding images, text, and audio together [23][24][25] - Instruction tuning will allow steering vectors based on instructions, enabling more specific document retrieval [27][28]

Architecting Agent Memory: Principles, Patterns, and Best Practices — Richmond Alake, MongoDB

AI Engineer· 2025-06-27 09:56

AI Agents and Memory - The presentation focuses on the importance of memory in AI agents, emphasizing that memory is crucial for making agents reflective, interactive, proactive, reactive, and autonomous [6] - The discussion highlights different forms of memory, including short-term, long-term, conversational entity memory, knowledge data store, cache, and working memory [8] - The industry is moving towards AI agents and agentic systems, with a focus on building believable, capable, and reliable agents [1, 21] MongoDB's Role in AI Memory - MongoDB is positioned as a memory provider for agentic systems, offering features needed to turn data into memory and enhance agent capabilities [20, 21, 31] - MongoDB's flexible document data model and retrieval capabilities (graph, vector, text, geospatial query) are highlighted as key advantages for AI memory management [25] - MongoDB acquired Voyage AI to improve AI systems by reducing hallucination through better embedding models and re-rankers [32, 33] - Voyage AI's embedding models and re-rankers will be integrated into MongoDB Atlas to simplify data chunking and retrieval strategies [34] Memory Management and Implementation - Memory management involves generation, storage, retrieval, integration, updating, and forgetting mechanisms [16, 17] - Retrieval Augmented Generation (RAG) is discussed, with MongoDB providing retrieval mechanisms beyond just vector search [18] - The presentation introduces "Memoriz," an open-source library with design patterns for various memory types in AI agents [21, 22, 30] - Different memory types are explored, including persona memory, toolbox memory, conversation memory, workflow memory, episodic memory, long-term memory, and entity memory [23, 25, 26, 27, 29, 30]