AI Engineer - filings, earnings calls, financial reports, news

AI Engineer

Search documents

"Data readiness" is a Myth: Reliable AI with an Agentic Semantic Layer — Anushrut Gupta, PromptQL

AI Engineer· 2025-06-27 09:40

Problem Statement - Data readiness is a myth, and achieving perfect data for AI is an unattainable pipe dream [1][2][3] - Fortune 500 companies lose an average of $250 million due to poor data quality [7] - Traditional semantic layers and knowledge graphs are insufficient for capturing the nuances of business language and tribal knowledge [8][9][10][11][12][13][14] Solution: Agentic Semantic Layer (PromQL) - PromQL is presented as a "day zero smart analyst" AI system that learns and improves over time through course correction and steering [17][18][19][20] - It uses a domain-specific language (DSL) for data retrieval, computation, aggregation, and semantics, decoupling LLM plan generation from execution [21][22] - The system allows for editing the AI's "brain" to correct its understanding and guide its learning [28] - It incorporates a prompt learning layer to improve the semantic graph and create a company-specific business language [31] - The semantic layer is version controlled, allowing for fallback to previous builds [33] Key Features and Benefits - Correctable, explainable, and steerable AI that improves with use [19] - Ability to handle messy data and understand business context [24][25] - Reduces months of work into immediate start, enabling faster AI deployments [37] - Self-improving and achieves 100% accuracy on complex tasks [37] Demonstrated Capabilities - The system can understand what revenue means and perform calculations [23] - It can identify and correct errors in data, such as incorrect status values [24] - It can integrate data from multiple databases and SAS applications [25][27] - It can summarize support tickets and extract sentiment [26][29] - It can learn the meaning of custom terms and relationships between tables [35][36] Customer Validation - A Fortune 500 food chain company and a high-growth fintech company achieved 100% accurate AI using PromQL [38]

Semantic Layer

Agentic Semantic Layer

Agentic Semantic Layer

Data Readiness

Knowledge Graph

Data Quality

Building Agentic Applications w/ Heroku Managed Inference and Agents — Julián Duque & Anush Dsouza

AI Engineer· 2025-06-27 09:38

Heroku Managed Inference and Agents Platform Overview - Heroku Managed Inference and Agents platform enables developers to build agentic applications that can reason, make decisions, and trigger actions [1] - The platform allows for provisioning and deploying LLMs, running untrusted code securely in multiple languages, and extending agents with the Model Context Protocol (MCP) [1] Key Capabilities - Heroku Managed Inference and Agents facilitates the deployment and management of LLMs [1] - The platform supports secure execution of untrusted code in Python, Nodejs, Go, and Ruby [1] - Model Context Protocol (MCP) can be used to extend agent capabilities [1] Target Applications - The platform is suitable for building internal tools, developer assistants, or customer-facing AI features [1]

Agentic Applications

AI integration

LLM

Heroku Managed Inference

Heroku Agents

Model Context Protocol (MCP)

Agentic Applications

AI integration

LLM

Heroku Managed Inference

Heroku Agents

Model Context Protocol (MCP)

Events are the Wrong Abstraction for Your AI Agents - Mason Egger, Temporal.io

AI Engineer· 2025-06-27 09:35

Core Argument - The presentation argues that event-driven architecture (EDA), while seemingly loosely coupled at runtime, is tightly coupled at design time, leading to complexities and challenges in AI agent development [21][22] - It proposes a shift in focus from events to durable execution as the core of AI agent architecture, which simplifies development and handles failures more effectively [26][27] Problems with Event-Driven Architecture - EDA sacrifices clear APIs, as events lack the documentation and structure of traditional APIs [15] - Business logic becomes fragmented and scattered across multiple services, making debugging and understanding the system more difficult [16] - Services become ad hoc state machines, leading to potential race conditions and difficult-to-debug issues [18][19] - EDA can lead to reluctance to iterate on architecture due to fear of breaking existing functionality [25] Durable Execution as a Solution - Durable execution is presented as a crash-proof execution environment that automatically preserves application state, virtualizes execution, and is not limited by time or hardware [27][28][29][30][31][32][33][34] - It allows developers to focus on business logic rather than managing events and queues [38] - Temporal provides durable execution as an open-source, MIT-licensed product with SDKs for multiple programming languages [38][39] - Durable execution abstracts away the complexities of events into the software layer [40][43] Temporal's Offering - Temporal's durable execution system offers automatic retries for failures, such as LLM downtime or rate limits [36] - It supports polyglot programming, allowing functions written in different languages to be called seamlessly [39] - Temporal is available for demonstration and further discussion at the company's booth and Slack channel [44][45]

Durable Execution

Event-Driven Architecture

AI Agents

Distributed Systems

Microservice Architecture

API

Durable Execution

Event-Driven Architecture

AI Agents

Distributed Systems

Microservice Architecture

API

Prompt Engineering is Dead — Nir Gazit, Traceloop

AI Engineer· 2025-06-27 09:34

Core Argument - The presentation challenges the notion of "prompt engineering" as a true engineering discipline, suggesting that iterative prompt improvement can be automated [1][2] - The speaker advocates for an alternative approach to prompt optimization, emphasizing the use of evaluators and automated agents [23] Methodology & Implementation - The company developed a chatbot for its website documentation using a Retrieval-Augmented Generation (RAG) pipeline [2] - The RAG pipeline consists of a Chroma database, OpenAI, and prompts to answer questions about the documentation [7] - An evaluator was built to assess the RAG pipeline's responses, using a dataset of questions and expected answers [5][7] - The evaluator uses a ground truth-based LLM as a judge, checking if the generated answers contain specific facts [10][13] - An agent was created to automatically improve prompts by researching online guides, running evaluations, and regenerating prompts based on failure reasons [5][18][19] - The agent uses Crew AI to think, call the evaluator, and regenerate prompts based on best practices [20] Results & Future Considerations - The initial score of the prompt was 0.4 (40%), and after two iterations with the agent, the score improved to 0.9 (90%) [21][22] - The company acknowledges the risk of overfitting to the training data (20 examples) and suggests splitting the data into train/test sets for better generalization [24][25] - Future work may involve applying the same automated optimization techniques to the evaluator and agent prompts [27] - The demo is available in the trace loop/autoprompting demo repository [27]

The Eyes Are The (Context) Window to The Soul: How Windsurf Gets to Know You — Sam Fertig, Windsurf

AI Engineer· 2025-06-27 09:34

Core Problem in AI Coding Space - Generating code is not difficult, but generating code that fits into existing codebases, adheres to organizational policies, personal preferences, and is future-proof is challenging [13][14][15] - The magic of AI coding tools like Windsurf lies in context, specifically "what context" and "how much" [16] Windsurf's Context Philosophy - "What context" is divided into two buckets: heuristics (user behavior) and hard evidence (environment/codebase) [17][18][19] - Relevant output is determined by the prompt, the state of the codebase, and the user state [20] - Windsurf prioritizes optimizing the relevance of context over simply increasing the size of the context window to address latency [21][22] Windsurf's Capabilities - Windsurf excels at finding relevant context quickly due to its background in GPU optimization [23] - Windsurf provides connectors for users to perform context retrieval at their level, including embedding search, memories, rules, and custom workspaces [24] Data Privacy - Windsurf processes information only within the user's editor and does not access the user's operating machine [31] - Windsurf's servers are stateless, and the company does not store or train on user data [31][32]

Context

AI coding toolkit

Large Language Models

Large Language Models

Windsurf editor

Windsurf plug-in

What does Enterprise Ready MCP mean? — Tobin South, WorkOS

AI Engineer· 2025-06-27 09:31

MCP and AI Agent Development - MCP is presented as a way of interfacing between AI and external resources, enabling functionalities like database access and complex computations [3] - The industry is currently focused on building internal demos and connecting them to APIs, but needs to move towards robust authentication and authorization [9][10] - The industry needs to adapt existing tooling for MCP due to its dynamic client registration, which can flood developer dashboards [12] Enterprise Readiness and Security - Scaling MCP servers requires addressing free credit abuse, bot blocking, and robust access controls [12] - Selling MCP solutions to enterprises necessitates SSO, lifecycle management, provisioning, fine-grained access controls, audit logs, and data loss prevention [12] - Regulations like GDPR impose specific logging requirements for AI workloads, which are not widely supported [12] Challenges and Future Development - Passing scope and access control between different AI workloads remains a significant challenge [13] - The MCP spec is actively developing, with features like elicitation (AI asking humans for input) still unstable [13] - Cloud vendors are solving cloud hosting, but authorization and access control are the hardest parts of enterprise deployment [13]

Model Context Protocol (MCP)

Model Context Protocol (MCP)

CI in the Era of AI: From Unit Tests to Stochastic Evals — Nathan Sobo, Zed

AI Engineer· 2025-06-27 09:22

Software engineers have long understood that high-quality code requires comprehensive automated testing. For decades, our industry has relied on deterministic tests with clear pass/fail outcomes to ensure reliability. High-quality software depends on automated testing. That's certainly true at Zed, where we're building a next-generation native IDE in Rust. Zed runs at 120 frames per second, but it would also crash once a second if we didn't maintain and run a comprehensive suite of unit tests on every chang ...

Artificial Intelligence

Continuous Integration

Automated Testing

Software Development

Zed

Artificial Intelligence

Continuous Integration

Automated Testing

Software Development

Zed

Building AI Agents that actually automate Knowledge Work - Jerry Liu, LlamaIndex

AI Engineer· 2025-06-24 00:16

Agents are all the rage in 2025, and every single b2b SaaS startup/incumbent promises AI agents that can "automate work" in some way. But how do you actually build this? The answer is two fold: 1. really really good tools 2. carefully tailored agent reasoning over these tools that range from assistant-to-automation based UXs. The main goal of this talk is to a practical overview of agent architectures that can automate real-world work, with a focus on document-centric tasks. Learn the core building blocks o ...

Large Scale AI on Apple Silicon (as mentioned by @AndrejKarpathy ) — Alex Cheema, EXO Labs

AI Engineer· 2025-06-20 22:52

Scientific Rigor and Progress - Scientific progress is not always linear, and inertia within the scientific community can hinder the adoption of new ideas, even with sound methodology [8][18] - Questioning assumptions is crucial for scientific advancement, as illustrated by historical examples in physics and rat experiments [9][16] - Oversimplification of science is a common pitfall, and publishing both successful and unsuccessful results is important for transparency and learning [18][35] AI Development and Hardware - The "hardware lottery" suggests that the best research ideas in AI don't always win due to various factors, including hardware limitations and existing paradigms [22] - Large Language Models (LLMs) can create inertia by reinforcing existing practices, such as the dominance of Python, making it harder for new programming languages to gain adoption [23][24] - GPUs addressed the von Neumann bottleneck of CPUs by changing the ratio of bytes loaded to floating-point operations executed, enabling significant performance improvements in AI [21] Exo's Solution and Research - Exo is developing an orchestration layer for AI that runs on different hardware targets, aiming to provide a reliable system for managing distributed devices in ad hoc mesh networks [25] - Exo models everything as a causally consistent set of events, creating a causal graph to reason about the system and ensure data consistency across distributed systems [26][27] - Exo's technology enables the efficient utilization of diverse hardware, such as combining Nvidia Spark (high compute) and Studio (high memory bandwidth) for LLM generation [28][29] - Exo is researching new optimizers that are more efficient per flop than Adam but require more memory, leveraging the memory-to-flops ratio of Apple silicon [31][32][33]

Artificial Intelligence

Hardware Lottery

Artificial Intelligence

Hardware Lottery

Building Agents with Amazon Nova Act and MCP - Du'An Lightfoot, Amazon (Full Workshop)

AI Engineer· 2025-06-19 02:04

Workshop Overview - The workshop focuses on building AI agents using Amazon's agent technologies [1] - Participants will gain hands-on experience in building sophisticated AI agents [1] - The workshop is 2-hour long [1] Technologies Highlighted - Amazon Nova Act is used for reliable web navigation [1] - Model Context Protocol (MCP) connects agents to external data sources and APIs [1] - Amazon Bedrock Agents orchestrates complex workflows [1] Skills Acquired - Participants will learn to build agents that can navigate the web like humans [1] - Participants will learn to perform complex multi-step tasks [1] - Participants will learn to leverage specialized tools through natural language commands [1]

Amazon(US:AMZN)

AI agents

Natural Language Processing

Amazon Nova Act

Model Context Protocol (MCP)

Amazon Bedrock Agents

AI agents

Natural Language Processing

Amazon Nova Act

Model Context Protocol (MCP)

Amazon Bedrock Agents

Previous Next