Workflow
AI Engineer
icon
Search documents
"Data readiness" is a Myth: Reliable AI with an Agentic Semantic Layer — Anushrut Gupta, PromptQL
AI Engineer· 2025-06-27 09:40
Problem Statement - Data readiness is a myth, and achieving perfect data for AI is an unattainable pipe dream [1][2][3] - Fortune 500 companies lose an average of $250 million due to poor data quality [7] - Traditional semantic layers and knowledge graphs are insufficient for capturing the nuances of business language and tribal knowledge [8][9][10][11][12][13][14] Solution: Agentic Semantic Layer (PromQL) - PromQL is presented as a "day zero smart analyst" AI system that learns and improves over time through course correction and steering [17][18][19][20] - It uses a domain-specific language (DSL) for data retrieval, computation, aggregation, and semantics, decoupling LLM plan generation from execution [21][22] - The system allows for editing the AI's "brain" to correct its understanding and guide its learning [28] - It incorporates a prompt learning layer to improve the semantic graph and create a company-specific business language [31] - The semantic layer is version controlled, allowing for fallback to previous builds [33] Key Features and Benefits - Correctable, explainable, and steerable AI that improves with use [19] - Ability to handle messy data and understand business context [24][25] - Reduces months of work into immediate start, enabling faster AI deployments [37] - Self-improving and achieves 100% accuracy on complex tasks [37] Demonstrated Capabilities - The system can understand what revenue means and perform calculations [23] - It can identify and correct errors in data, such as incorrect status values [24] - It can integrate data from multiple databases and SAS applications [25][27] - It can summarize support tickets and extract sentiment [26][29] - It can learn the meaning of custom terms and relationships between tables [35][36] Customer Validation - A Fortune 500 food chain company and a high-growth fintech company achieved 100% accurate AI using PromQL [38]
Building Agentic Applications w/ Heroku Managed Inference and Agents — Julián Duque & Anush Dsouza
AI Engineer· 2025-06-27 09:38
Heroku Managed Inference and Agents Platform Overview - Heroku Managed Inference and Agents platform enables developers to build agentic applications that can reason, make decisions, and trigger actions [1] - The platform allows for provisioning and deploying LLMs, running untrusted code securely in multiple languages, and extending agents with the Model Context Protocol (MCP) [1] Key Capabilities - Heroku Managed Inference and Agents facilitates the deployment and management of LLMs [1] - The platform supports secure execution of untrusted code in Python, Nodejs, Go, and Ruby [1] - Model Context Protocol (MCP) can be used to extend agent capabilities [1] Target Applications - The platform is suitable for building internal tools, developer assistants, or customer-facing AI features [1]
Events are the Wrong Abstraction for Your AI Agents - Mason Egger, Temporal.io
AI Engineer· 2025-06-27 09:35
Core Argument - The presentation argues that event-driven architecture (EDA), while seemingly loosely coupled at runtime, is tightly coupled at design time, leading to complexities and challenges in AI agent development [21][22] - It proposes a shift in focus from events to durable execution as the core of AI agent architecture, which simplifies development and handles failures more effectively [26][27] Problems with Event-Driven Architecture - EDA sacrifices clear APIs, as events lack the documentation and structure of traditional APIs [15] - Business logic becomes fragmented and scattered across multiple services, making debugging and understanding the system more difficult [16] - Services become ad hoc state machines, leading to potential race conditions and difficult-to-debug issues [18][19] - EDA can lead to reluctance to iterate on architecture due to fear of breaking existing functionality [25] Durable Execution as a Solution - Durable execution is presented as a crash-proof execution environment that automatically preserves application state, virtualizes execution, and is not limited by time or hardware [27][28][29][30][31][32][33][34] - It allows developers to focus on business logic rather than managing events and queues [38] - Temporal provides durable execution as an open-source, MIT-licensed product with SDKs for multiple programming languages [38][39] - Durable execution abstracts away the complexities of events into the software layer [40][43] Temporal's Offering - Temporal's durable execution system offers automatic retries for failures, such as LLM downtime or rate limits [36] - It supports polyglot programming, allowing functions written in different languages to be called seamlessly [39] - Temporal is available for demonstration and further discussion at the company's booth and Slack channel [44][45]
Prompt Engineering is Dead — Nir Gazit, Traceloop
AI Engineer· 2025-06-27 09:34
Core Argument - The presentation challenges the notion of "prompt engineering" as a true engineering discipline, suggesting that iterative prompt improvement can be automated [1][2] - The speaker advocates for an alternative approach to prompt optimization, emphasizing the use of evaluators and automated agents [23] Methodology & Implementation - The company developed a chatbot for its website documentation using a Retrieval-Augmented Generation (RAG) pipeline [2] - The RAG pipeline consists of a Chroma database, OpenAI, and prompts to answer questions about the documentation [7] - An evaluator was built to assess the RAG pipeline's responses, using a dataset of questions and expected answers [5][7] - The evaluator uses a ground truth-based LLM as a judge, checking if the generated answers contain specific facts [10][13] - An agent was created to automatically improve prompts by researching online guides, running evaluations, and regenerating prompts based on failure reasons [5][18][19] - The agent uses Crew AI to think, call the evaluator, and regenerate prompts based on best practices [20] Results & Future Considerations - The initial score of the prompt was 0.4 (40%), and after two iterations with the agent, the score improved to 0.9 (90%) [21][22] - The company acknowledges the risk of overfitting to the training data (20 examples) and suggests splitting the data into train/test sets for better generalization [24][25] - Future work may involve applying the same automated optimization techniques to the evaluator and agent prompts [27] - The demo is available in the trace loop/autoprompting demo repository [27]
The Eyes Are The (Context) Window to The Soul: How Windsurf Gets to Know You — Sam Fertig, Windsurf
AI Engineer· 2025-06-27 09:34
Core Problem in AI Coding Space - Generating code is not difficult, but generating code that fits into existing codebases, adheres to organizational policies, personal preferences, and is future-proof is challenging [13][14][15] - The magic of AI coding tools like Windsurf lies in context, specifically "what context" and "how much" [16] Windsurf's Context Philosophy - "What context" is divided into two buckets: heuristics (user behavior) and hard evidence (environment/codebase) [17][18][19] - Relevant output is determined by the prompt, the state of the codebase, and the user state [20] - Windsurf prioritizes optimizing the relevance of context over simply increasing the size of the context window to address latency [21][22] Windsurf's Capabilities - Windsurf excels at finding relevant context quickly due to its background in GPU optimization [23] - Windsurf provides connectors for users to perform context retrieval at their level, including embedding search, memories, rules, and custom workspaces [24] Data Privacy - Windsurf processes information only within the user's editor and does not access the user's operating machine [31] - Windsurf's servers are stateless, and the company does not store or train on user data [31][32]
What does Enterprise Ready MCP mean? — Tobin South, WorkOS
AI Engineer· 2025-06-27 09:31
MCP and AI Agent Development - MCP is presented as a way of interfacing between AI and external resources, enabling functionalities like database access and complex computations [3] - The industry is currently focused on building internal demos and connecting them to APIs, but needs to move towards robust authentication and authorization [9][10] - The industry needs to adapt existing tooling for MCP due to its dynamic client registration, which can flood developer dashboards [12] Enterprise Readiness and Security - Scaling MCP servers requires addressing free credit abuse, bot blocking, and robust access controls [12] - Selling MCP solutions to enterprises necessitates SSO, lifecycle management, provisioning, fine-grained access controls, audit logs, and data loss prevention [12] - Regulations like GDPR impose specific logging requirements for AI workloads, which are not widely supported [12] Challenges and Future Development - Passing scope and access control between different AI workloads remains a significant challenge [13] - The MCP spec is actively developing, with features like elicitation (AI asking humans for input) still unstable [13] - Cloud vendors are solving cloud hosting, but authorization and access control are the hardest parts of enterprise deployment [13]
CI in the Era of AI: From Unit Tests to Stochastic Evals — Nathan Sobo, Zed
AI Engineer· 2025-06-27 09:22
Software engineers have long understood that high-quality code requires comprehensive automated testing. For decades, our industry has relied on deterministic tests with clear pass/fail outcomes to ensure reliability. High-quality software depends on automated testing. That's certainly true at Zed, where we're building a next-generation native IDE in Rust. Zed runs at 120 frames per second, but it would also crash once a second if we didn't maintain and run a comprehensive suite of unit tests on every chang ...
Building AI Agents that actually automate Knowledge Work - Jerry Liu, LlamaIndex
AI Engineer· 2025-06-24 00:16
Agents are all the rage in 2025, and every single b2b SaaS startup/incumbent promises AI agents that can "automate work" in some way. But how do you actually build this? The answer is two fold: 1. really really good tools 2. carefully tailored agent reasoning over these tools that range from assistant-to-automation based UXs. The main goal of this talk is to a practical overview of agent architectures that can automate real-world work, with a focus on document-centric tasks. Learn the core building blocks o ...
Large Scale AI on Apple Silicon (as mentioned by @AndrejKarpathy ) — Alex Cheema, EXO Labs
AI Engineer· 2025-06-20 22:52
Scientific Rigor and Progress - Scientific progress is not always linear, and inertia within the scientific community can hinder the adoption of new ideas, even with sound methodology [8][18] - Questioning assumptions is crucial for scientific advancement, as illustrated by historical examples in physics and rat experiments [9][16] - Oversimplification of science is a common pitfall, and publishing both successful and unsuccessful results is important for transparency and learning [18][35] AI Development and Hardware - The "hardware lottery" suggests that the best research ideas in AI don't always win due to various factors, including hardware limitations and existing paradigms [22] - Large Language Models (LLMs) can create inertia by reinforcing existing practices, such as the dominance of Python, making it harder for new programming languages to gain adoption [23][24] - GPUs addressed the von Neumann bottleneck of CPUs by changing the ratio of bytes loaded to floating-point operations executed, enabling significant performance improvements in AI [21] Exo's Solution and Research - Exo is developing an orchestration layer for AI that runs on different hardware targets, aiming to provide a reliable system for managing distributed devices in ad hoc mesh networks [25] - Exo models everything as a causally consistent set of events, creating a causal graph to reason about the system and ensure data consistency across distributed systems [26][27] - Exo's technology enables the efficient utilization of diverse hardware, such as combining Nvidia Spark (high compute) and Studio (high memory bandwidth) for LLM generation [28][29] - Exo is researching new optimizers that are more efficient per flop than Adam but require more memory, leveraging the memory-to-flops ratio of Apple silicon [31][32][33]
Building Agents with Amazon Nova Act and MCP - Du'An Lightfoot, Amazon (Full Workshop)
AI Engineer· 2025-06-19 02:04
Workshop Overview - The workshop focuses on building AI agents using Amazon's agent technologies [1] - Participants will gain hands-on experience in building sophisticated AI agents [1] - The workshop is 2-hour long [1] Technologies Highlighted - Amazon Nova Act is used for reliable web navigation [1] - Model Context Protocol (MCP) connects agents to external data sources and APIs [1] - Amazon Bedrock Agents orchestrates complex workflows [1] Skills Acquired - Participants will learn to build agents that can navigate the web like humans [1] - Participants will learn to perform complex multi-step tasks [1] - Participants will learn to leverage specialized tools through natural language commands [1]