AI Engineer

Search documents
Production software keeps breaking and it will only get worse — Anish Agarwal, Traversal.ai
AI Engineer· 2025-07-10 16:29
Problem Statement - The current software engineering workflow is inefficient, with too much time spent on troubleshooting production incidents [2][9] - Existing approaches to automated troubleshooting, such as AIOps and LLMs, have fundamental limitations [10][11][12][13][14][15][16][17][18] - Troubleshooting is becoming increasingly complex due to AI-generated code and increasingly complex systems [3][4] Solution: Traversal's Approach - Traversal combines causal machine learning (statistics), reasoning models (semantics), and a novel agentic control flow (swarms of agents) for autonomous troubleshooting [19][20][21][22][23][24] - Causal machine learning helps identify cause-and-effect relationships in data, addressing the issue of correlated failures [20][21] - Reasoning models provide semantic understanding of logs, metrics, and code [22] - Swarms of agents enable exhaustive search through telemetry data in an efficient way [23][24] Results and Impact - Traversal has achieved a 40% reduction in mean time to resolution (MTTR) for Digital Ocean, a cloud provider serving hundreds of thousands of customers [32][37] - Traversal AI orchestrates a swarm of expert AIs to sift through petabytes of observability data in parallel, providing users with the root cause of incidents within five minutes [39][40] - Traversal integrates with various observability tools, processing trillions of logs [45] Future Applications - The principles of exhaustive search and swarms of agents can be applied to other domains such as network observability and cybersecurity [47]
Thinking Deeper in Gemini — Jack Rae, Google DeepMind
AI Engineer· 2025-07-10 16:00
Model Development & Architecture - Gemini Thinking is presented as a solution to address limitations in test-time compute, marking progress towards general intelligence [1] - The industry focuses on identifying fundamental intelligence bottlenecks within existing models and developing solutions to improve architecture or training objectives [1] Capabilities & Steerability - Recent progress in Thinking is highlighted, emphasizing both capability and steerability improvements [1] Future Directions - The document outlines the future direction of the models, indicating ongoing development and evolution [1]
A year of Gemini progress + what comes next — Logan Kilpatrick, Google DeepMind
AI Engineer· 2025-07-10 07:00
Over the last year, Google and Gemini models have shown rapid progress across all dimensions (model, product, etc). Let's highlight all the work that has happened, how we got the worlds best models, and where we are going next (across both the model landscape and out AI products). About Logan Kilpatrick Logan leads product for Google AI Studio and works on the Gemini API. Before Google, Logan led developer relations at OpenAI. Recorded at the AI Engineer World's Fair in San Francisco. Stay up to date on our ...
The Wild World of AI: 6 Months That Changed Everything
AI Engineer· 2025-07-10 03:23
There are all of these benchmarks full of numbers. I don't like the numbers. There are the leaderboards.I'm kind of beginning to lose trust in the leaderboards as well. So for my own work, I've been leaning increasingly into my own little benchmark, which started as a joke and has actually turned into something that I I rely on quite a lot. And that's this.I prompt models with generate an SVG of a pelican riding a bicycle. I have good reasons for this. Um firstly, these are not image models. These are text ...
2025 in LLMs so far, illustrated by Pelicans on Bicycles — Simon Willison
AI Engineer· 2025-07-09 16:00
LLM Advancements - The field of LLMs has experienced significant advancements in the past 12 months [1] - The report reviews the latest models, free from vendor or employer influence [1] Speaker Information - Simon Willison is the creator of Datasette, an open source tool for exploring and publishing data [1] - Simon Willison was an engineering director at Eventbrite [1] - Simon Willison is a co-creator of the Django Web Framework [1] Event Information - The recording took place at the AI Engineer World's Fair in San Francisco [1] - Readers can stay updated on upcoming events and content by joining the newsletter [1]
Trends Across the AI Frontier — George Cameron, ArtificialAnalysis.ai
AI Engineer· 2025-07-08 16:00
Company Overview - Artificial Analysis is an independent benchmarking and insights company focused on helping developers and companies select appropriate AI models and technologies for application development [1] - The company provides extensive benchmarking results on its website, covering intelligence, performance, cost, and other factors [1] - Artificial Analysis develops reports to inform key strategic decisions related to AI [1] AI Industry Trends - The entire AI stack, from chips to infrastructure to models, is developing rapidly [1] - It is important to differentiate the signal from the noise in the rapidly evolving AI landscape [1] Expertise - Artificial Analysis' CEO, Micah Hill-Smith, has a background in AI engineering and strategy consulting with McKinsey & Company [1] - George Cameron is the CPO of Artificial Analysis [1] Events and Content - Artificial Analysis presented at the AI Engineer World's Fair in San Francisco [1] - The company encourages individuals to subscribe to its newsletter for updates on upcoming events and content [1]
Claude Code & the evolution of agentic coding - Boris Cherny
AI Engineer· 2025-07-04 16:00
[Music] Hello. This awesome. This is a big crowd.Who here has used quad code before. Jesus. Awesome.That's what I like to see. Cool. So, my name is Boris.I'm a member of technical staff at Enthropic and creator of Quad Code. And um I was struggling with what to talk about for audience that already knows quad code, already knows AI and all the coding tools and agentic coding and stuff like that. So, I'm going to zoom out a little bit and then we'll zoom back in.So here's my TLDDR. The model is moving really ...
12-Factor Agents: Patterns of reliable LLM applications — Dex Horthy, HumanLayer
AI Engineer· 2025-07-03 20:50
Core Principles of Agent Building - The industry emphasizes rethinking agent development from first principles, applying established software engineering practices to build reliable agents [11] - The industry highlights the importance of owning the control flow in agent design, allowing for flexibility in managing execution and business states [24][25] - The industry suggests that agents should be stateless, with state management handled externally to provide greater flexibility and control [47][49] Key Factors for Reliable Agents - The industry recognizes the ability of LLMs to convert natural language into JSON as a fundamental capability for building effective agents [13] - The industry suggests that direct tool use by agents can be harmful, advocating for a more structured approach using JSON and deterministic code [14][16] - The industry emphasizes the need to own and optimize prompts and context windows to ensure the quality and reliability of agent outputs [30][33] Practical Applications and Considerations - The industry promotes the use of small, focused "micro agents" within deterministic workflows to improve manageability and reliability [40] - The industry encourages integrating agents with various communication channels (email, Slack, Discord, SMS) to meet users where they are [39] - The industry advises focusing on the "hard AI parts" of agent development, such as prompt engineering and flow optimization, rather than relying on frameworks to abstract away complexity [52]
MCP Is Not Good Yet — David Cramer, Sentry
AI Engineer· 2025-07-03 16:00
MCP Overview & Architecture - MCP (Micro Control Plane) is defined as a pluggable architecture for agents, contextualized within an enterprise cloud service [5][6] - Sentry's MCP server was initially built as a fun project and is biased towards Sentry's application monitoring services [4][5] - The industry views MCP as a potential solution for integrating services into various agents, enabling bug fixes and workflow enhancements within editors [7][8][25] Implementation & Challenges - Implementing MCP involves complexities around OAUTH 21%, requiring solutions like Cloudflare Shim for proxying OAUTH 2 API [16][17] - A key challenge is that MCP cannot simply sit on top of Open API; systems need to be designed around how agents and models react to provided context [19][20][21] - Current client support for native authentication is still evolving, with some clients like Cursor experiencing breakage [22] Security & Best Practices - Security is a major concern, particularly with the standard IO interface, and random MCP tools should not be allowed within organizations [27] - For B2B SaaS companies, focusing on OAUTH with remote environments is crucial for integrating services into agents [25] - Companies should avoid simply proxying Open API and exposing it as tools, as this yields poor results; intentional design and context provision are necessary [30] Agent-Centric Approach - The industry should focus on building agents, viewing MCP as a plug-in architecture to leverage the value of LLMs [39][40] - Exposing agents through the MCP architecture, particularly in B2B settings, is seen as a significant value unlock [42] - Optimizing for context in workflows and understanding data is crucial when designing agents, with a focus on providing structured information like Markdown for language models [31][50]
The New Lean Startup — Sid Bendre, Oleve
AI Engineer· 2025-07-01 16:57
Company Overview & Vision - Aliv is building consumer software products aiming to improve users' lives [3] - The company's vision is to create a portfolio of "one person billion-dollar companies" [34] - Aliv emphasizes a lean startup approach, focusing on small teams and early profitability [1][2] Key Achievements & Metrics - Aliv scaled a portfolio of products to $6 million in ARR (Annual Recurring Revenue) profitably [3] - The company has generated over 500 million views across social media [3] - One product, Unstuck AI, reached 1 million users in under nine weeks [8] - Another product launch saw 10,000 users in less than 30 hours [4] Lean Operating Principles - Prioritizes hiring "10xer generalists" with complementary skills [10][11] - Emphasizes a "profit-first mentality" to guide decision-making [11][12] - Focuses on continuous process refinement and learning from failures [13] - Leverages "super tools" by reinventing the ways to use old tools and consolidating workflows [14][15] - Believes in building compounding benefits through technical playbooks and operational blueprints [14][15] Organizational Structure - Adopts a "harvester and cultivator" model for its engineering organization, inspired by Palantir [21][22] - Harvesters are product engineers who own and manage their products end-to-end [22][23] - Cultivators are AI software engineers focused on building the company's agentic operating system and automation [24] AI Tooling & Automation - Uses AI tooling to augment existing talent, not to compensate for shortcomings [25] - Implements a three-stage automation strategy: human-led tooling, workflow automation, and autonomous decision-making systems [28][29][30] - Aims to build a company where strategic insights are provided by people, but operations are run by AI agents [30] - Explores using AI agents for market research, acquisition target scoring, and growth system automation [30][31]