AI Engineer

Search documents
Mentoring the Machine — Eric Hou, Augment Code
AI Engineer· 2025-07-24 15:01
AI Agent Development & Management - AI agents require mentorship similar to interns to ensure effective deployment [1] - Treating AI agents as a tech lead would, rather than just a user, maximizes their leverage [1] - Effective use of AI agents impacts software engineering at both micro and macro levels [1] Software Development Lifecycle (SDLC) - The report previews how AI agents can change the calculus of software engineering [1] - Practical advice for working with AI agents in the SDLC will be provided [1]
Building Applications with AI Agents — Michael Albada, Microsoft
AI Engineer· 2025-07-24 15:00
Agentic Development Landscape - The adoption of agentic technology is rapidly increasing, with a 254% increase in companies self-identifying as agentic in the last three years based on Y Combinator data [5] - Agentic systems are complex, and while initial prototypes may achieve around 70% accuracy, reaching perfection is difficult due to the long tail of complex scenarios [6][7] - The industry defines an agent as an entity that can reason, act, communicate, and adapt to solve tasks, viewing the foundation model as a base for adding components to enhance performance [8] - The industry emphasizes that agency should not be the ultimate goal but a tool to solve problems, ensuring that increased agency maintains a high level of effectiveness [9][11][12] Tool Use and Orchestration - Exposing tools and functionalities to language models enables agents to invoke functions via APIs, but requires careful consideration of which functionalities to expose [14] - The industry advises against a one-to-one mapping between APIs and tools, recommending grouping tools logically to reduce semantic collision and improve accuracy [17][18] - Simple workflow patterns, such as single chains, are recommended for orchestration to improve measurability, reduce costs, and enhance reliability [19][20] - For complex scenarios, the industry suggests considering a move to more agentic patterns and potentially fine-tuning the model [22][23] Multi-Agent Systems and Evaluation - Multi-agent systems can help scale the number of tools by breaking them into semantically similar groups and routing tasks to appropriate agents [24][25] - The industry recommends investing more in evaluation to address the numerous hyperparameters involved in building agentic systems [27][28] - AI architects and engineers should take ownership of defining the inputs and outputs of agents to accelerate team progress [29][30] - Tools like Intel Agent, Microsoft's Pirate, and Label Studio can aid in generating synthetic inputs, red teaming agents, and building evaluation sets [33][34][35] Observability and Common Pitfalls - The industry emphasizes the importance of observability using tools like OpenTelemetry to understand failure modes and improve systems [38] - Common pitfalls include insufficient evaluation, inadequate tool descriptions, semantic overlap between tools, and excessive complexity [39][40] - The industry stresses the importance of designing for safety at every layer of agentic systems, including building tripwires and detectors [41][42]
AX is the only Experience that Matters - Ivan Burazin, Daytona
AI Engineer· 2025-07-24 14:15
Agent Experience Definition and Importance - Agent experience is defined as how easily agents can access, understand, and operate within digital environments to achieve user-defined goals [5] - The industry believes agent experience is the only experience that matters because agents will be the largest user base [33] - The industry suggests that if a tool requires human intervention, it hasn't fully addressed agent needs [33] The Shift in Development Tools - 37% of the latest YC batch are building agents as their products, indicating a shift from co-pilots and legacy SAS companies [1] - The industry argues that tools built for humans are for the past, and the focus should be on building tools for agents [3] - The industry emphasizes the need to build tools that enable agents to operate autonomously [12][13] Key Components of Agent Experience - Seamless authentication is crucial; agents should be able to authenticate without exposing passwords [6][7] - Agent-readable documentation is essential, with standards like appending ".md" to URLs and using llm's.txt [8][9] - API-first design is critical, providing agents with machine-native interfaces to access functionality efficiently [10] Daytona's Approach to Agent Native Runtime - Daytona aims to provide agents with a computing environment similar to a laptop for humans [19] - Daytona's initial focus was on speed, achieving a spin-up time of 27 milliseconds for agent tools [21] - Daytona preloads environments with headless tools like file explorers, Git clients, and LSP to help agents do things faster [22] Daytona's Features for Autonomous Agents - Daytona offers a declarative image builder, allowing agents to create and launch new sandboxes with custom dependencies [27] - Daytona provides Daytona volumes, enabling agents to efficiently share large datasets across multiple machines [29] - Daytona supports parallel execution, allowing agents to fork machines and explore multiple options simultaneously [31]
How to build Enterprise Aware Agents - Chau Tran, Glean
AI Engineer· 2025-07-24 09:22
[Music] Thanks Alex for the introduction. That was a very impressive LLM generated summary of me. Uh I've never heard it before but uh nice.Um so um today I'm going to talk to you about something that has been keeping me up at night. Uh probably some of you too. So how to build enterprise aware agents.How to bring the brilliance of AI into the messy complex realities of uh how your business operated. So let's jump straight to the hottest question of the month for AI builders. Uh should I build workflows or ...
Monetizing AI — Alvaro Morales, Orb
AI Engineer· 2025-07-23 19:45
As AI continues to transform industries, companies are faced with the critical challenge of effectively monetizing AI-driven products in a way that captures value, ensures customer adoption, and scales revenue sustainably. Unlike traditional SaaS models, AI-powered products have unique complexities - such as fluctuating usage patterns, variable compute costs, and evolving customer demands, making conventional pricing strategies unhelpful to the growth of an AI product-led startup. In this session, Alvaro Mo ...
Does AI Actually Boost Developer Productivity? (100k Devs Study) - Yegor Denisov-Blanch, Stanford
AI Engineer· 2025-07-23 17:00
Productivity Impact of AI - AI adoption shows an average developer productivity boost of approximately 20% [1] - AI's impact on developer productivity varies significantly across teams, with some experiencing decreased productivity [1] Factors Influencing AI Adoption Success - Company types, industries, and tech stacks play a crucial role in determining the extent of productivity gains from AI [1] - Data-driven evidence is essential for building a successful AI strategy tailored to specific contexts [1] Study Details - The study analyzed real-world productivity data from nearly 100,000 developers across hundreds of companies [1] - The research was conducted at Stanford University [1]
How agents will unlock the $500B promise of AI - Donald Hruska, Retool
AI Engineer· 2025-07-23 15:51
AI Market Growth & Trends - AI infrastructure spending has reached $0.5 trillion, yet many companies are limited to basic chatbots and code generation [2] - Anthropic's annualized revenue has grown rapidly, 3xing in 5 months, reaching $3 billion by the end of May [3] - OpenAI is projected to reach $12 billion in revenue by the end of 2025, a 3x increase from the previous year, driven by enterprise AI spending [4] - Cost per token for AI inference dropped dramatically by 99.7% from 2022 to 2024 [33] - Google searches for "AI agents" increased 11x in the last 16 months [34] Retool's Agentic AI Solution - Retool is breaking into Agentic AI with the release of Retool Agents, enabling enterprises to build agents with guardrails that integrate into production systems [2] - Retool customers have automated over 100 million hours of work, freeing up human potential [31] - Retool's cheapest agent is priced at $3 per hour [33] Agent Development Strategies - Companies have four options for agent development: building from scratch, using a framework like Lang graph, using an agent platform like Retool Agents, or using verticalized agents [16][17][18][19] - The decision to build or buy agents depends on whether it's part of the core product, involves regulated data, or is a commodity workflow needed quickly [21] - When considering a managed platform, evaluate the breadth of connectors, built-in permissioning, compliance, audit trails, and observability [22][23] Enterprise Considerations for AI Agents - Enterprises need to consider single sign-on, role-based access control, secure integration with external services, audit logs, compliance, and internationalization when deploying AI agents [13][14] - Risks of using AI-generated code in production include hallucinations, unpredictable results, security vulnerabilities, and cost overruns [15]
How Intuit uses LLMs to explain taxes to millions of taxpayers - Jaspreet Singh, Intuit
AI Engineer· 2025-07-23 15:51
Intuit's Use of LLMs in TurboTax - Intuit successfully processed 44 million tax returns for tax year 2023, aiming to provide users with high confidence in their tax filings and ensure they receive the best deductions [2] - Intuit's Geni experiences are built on GenOS, a proprietary generative OS platform designed to address the limitations of out-of-the-box tooling, especially concerning regulatory compliance, safety, and security in the tax domain [4][5] - Intuit uses Claude (Anthropic) for static queries related to tax refunds and OpenAI's GPT-4 for dynamic question answering, such as user-specific tax inquiries [9][10][12] - Intuit is one of the biggest users of Claude, with a multi-million dollar contract [9][10] Development and Evaluation - Intuit emphasizes a phased evaluation system, starting with manual evaluations by tax analysts and transitioning to automated evaluations using LLM as a judge [16][17] - Tax analysts also serve as prompt engineers, leveraging their expertise to ensure accurate evaluations and prompt design [16][17] - Key evaluation pillars include accuracy, relevancy, and coherence, with a strong focus on tax accuracy [20][24] - Intuit uses AWS Ground Truth for creating golden datasets for evaluations [22] Challenges and Learnings - LLM contracts are expensive, and long-term contracts are slightly cheaper but create vendor lock-in [25][26] - LLM models have higher latency compared to backend services (3-10 seconds), which can be exacerbated during peak tax season [27][28] - Intuit employs safety guardrails and ML models to prevent hallucination of numbers in LLM responses, ensuring data accuracy [40][41] - Graph RAG outperforms regular RAG in providing personalized and helpful answers to users [42][43]
From Hype to Habit: How We’re Building an AI-First SaaS Company—While Still Shipping the Roadmap
AI Engineer· 2025-07-23 15:51
Strategy - AI first 意味着从在产品中添加 AI 功能发展到通过 AI 视角重新思考如何规划、构建和交付价值 [4] - AI first 公司需要像初创公司一样的好奇心和敏捷性,同时具备企业般的纪律性,两者并行 [12] - 公司需要平衡当前客户需求和对未来 AI 投资之间的关系,避免过度关注一方而落后 [11] - 规划方式需要拥抱不确定性,学习和发现塑造前进的道路,目的地本身也会随着对可能性的了解而演变 [13] Ways of Working - 需要将发现过程视为可重复的、有意的过程,在规划周期中构建用于实验、黑客马拉松和学习的时间 [19][20] - 将流程视为产品,根据结果评估其有效性,如果流程不能提高方向的清晰度、帮助团队或加速决策,则需要迭代或完全删除 [23] - 从速度转向智能速度,意味着培养有目的地快速行动的能力,在清晰、动力和适应性中工作 [25] People - 成为 AI first 公司主要是一种文化转型,需要重新思考在 AI 时代优秀人才的定义,不仅在 AI 团队中,而且在整个公司中 [26][27] - 投资于 T 型人才,即拥有深厚专业知识,同时可以扩展宽度、快速原型设计、跨部门流畅协作并实现端到端系统的人才 [29] - 需要在整个组织内建立 AI 流利度,让每个团队都感到有能力理解 AI,并有足够的信心使用 AI 进行构建 [33][34]
Machines of Buying and Selling Grace - Adam Behrens, New Generation
AI Engineer· 2025-07-23 15:51
E-commerce Evolution with AI - E-commerce has evolved from physical stores to online platforms, and AI is now digitizing participants and their interactions, moving from static websites to merchant and consumer agents [1][2][5] - The goal remains transaction completion, but the focus shifts to dynamic, real-time, and generative interfaces for both human and agentic consumers [6][7] Challenges and Solutions in the Agentic Commerce - The industry faces challenges in enabling software agents to complete transactions, with solutions including delegated authentication via partners like Visa [13][14][15] - Moving from inferred buyer intent (keyword searches, click data) to explicitly captured intent through conversation data is crucial [16] - Merchants are exploring how to convert fuzzy intent into specific product SKUs, noting higher conversion rates, dollar values, and lifetime values from AI channels [17][18] - Ensuring product availability across numerous stores requires moving beyond existing product feed infrastructure and web scraping towards a unified API for product data [20][21][22] - Representing buyer and seller preferences needs to evolve from siloed data to rich context across all aspects of their lives, with market design challenges addressed by third-party institutions [23][24][26] The Future of Retail and Brand Strategy - Fortune 500 companies are adapting to technological shifts, with examples like Samsung evolving from a fish merchant to a technology leader [29][30] - Brands are creating APIs and MCP servers for chat clients, abstracting complex product systems into consistent APIs [31][32] - Companies are connecting product data with brand and design systems to experiment with generative interfaces and conversational commerce [33][34] - Enabling payment flows for bot traffic is essential, as AI chat users demonstrate higher intent and conversion rates [35][36] - The industry believes stores will evolve back to their original form: a conversation, with brands owning surfaces in various applications [36][40]