AI Engineer

Search documents
Waymo's EMMA: Teaching Cars to Think - Jyh Jing Hwang, Waymo
AI Engineer· 2025-07-26 17:00
Autonomous Driving History and Challenges - Autonomous driving research started in the 1980s with simple neural networks and evolved to end-to-end driving models by 2020 [2] - Scaling autonomous driving presents challenges, requiring solutions for long-tail events and rare scenarios [5][7] - Foundation models, like Gemini, show promise in generalizing to rare driving events and providing appropriate responses [8][9][10][11] Emma: A Multimodal Large Language Model for Autonomous Driving - The company is exploring Emma, a driving system leveraging Gemini, which uses routing text and camera input to predict future waypoints [11][12][13][14] - Emma is self-supervised, camera-only, and high-dimension map-free, achieving state-of-the-art quality on the nuScenes benchmark [15][16][17] - Channel reasoning is incorporated into Emma, allowing the model to explain its driving decisions and improve performance on a 100k dataset [17] Evaluation and Validation - Evaluation is crucial for the success of autonomous driving models, including open loop evaluation, simulations, and real-world testing [25] - Generative models are being explored for sensor simulation to evaluate the planner under various conditions like rain and different times of day [26][27][28] Future Directions - The company aims to improve generalization and scale autonomous driving by leveraging foundation models [30] - Training on larger datasets improves the quality of the planner [19][20] - The company is exploring training on various tasks, such as 3D detection and rograph estimation, to create a more generalizable model [21][22][23][24]
Ship Production Software in Minutes, Not Months — Eno Reyes, Factory
AI Engineer· 2025-07-25 23:11
[Music] Hi everybody, my name is Eno. I really appreciate that introduction. Um, and maybe I can start with a bit of background.Uh, I started working on LLMs about two and a half years ago. uh when uh GBT3.5% was coming out and it became increasingly clear that agentic systems were going to be possible with the help of LLMs. . At factory we believe that the way that we use agents in particular to build software is going to radically change the field of software development. We're transitioning from the era ...
Beyond the Prototype: Using AI to Write High-Quality Code - Josh Albrecht, Imbue
AI Engineer· 2025-07-25 23:10
Imbue's Focus and Sculptor's Purpose - Imbue is focused on creating more robust and useful AI agents, specifically software agents, with Sculptor as its main product [1] - Sculptor aims to bridge the gap between AI-generated code and production-ready code, addressing the challenges of using AI coding tools in established codebases [3] - The goal of Sculptor is to build user trust in AI-generated code by using another AI system to identify potential problems like race conditions or exposed API keys [7][8] Key Technical Decisions and Features of Sculptor - Sculptor emphasizes synchronous and immediate feedback on code changes to facilitate early problem detection and resolution [9][10] - Sculptor encourages users to learn existing solutions, plan before coding, write specs and docs, and adhere to strict style guides to prevent errors in AI-generated code [11][12][13][15][16][18] - Sculptor helps detect outdated code and documentation, highlights inconsistencies, and suggests style guide improvements to maintain code quality [17][18][19] Error Detection and Prevention Strategies in Sculptor - Sculptor integrates automated tools like linters to detect and automatically fix errors in AI-generated code [21][22] - Sculptor promotes writing tests, especially with AI assistance, to ensure code correctness and prevent unintended behavior changes [25][26][27] - Sculptor advocates for functional-style coding, happy and unhappy path unit tests, and integration tests to improve test effectiveness [28][29][30][33] - Sculptor utilizes LLMs to check for various issues, including style guide violations, missing specs, and unimplemented features, allowing for custom best practices [38] Future of AI-Assisted Development - Imbue is interested in integrating other developer tools for debugging, logging, tracing, profiling, and automated quality assurance into Sculptor [42][44] - The company anticipates that improved contextual search systems and AI models will further enhance the development experience [43]
Your Coding Agent Just Got Cloned And Your Brain Isn't Ready - Rustin Banks, Google Jules
AI Engineer· 2025-07-25 23:06
Product Introduction & Features - Jules is introduced as an asynchronous coding agent designed to run in the background and handle parallel tasks, launched at IO [1] - Jules aims to automate routine coding tasks, such as Firebase SDK updates or enabling development from a phone [1] - Jules is powered by Gemini 2.5% Pro [18] Parallelism & Use Cases - Two types of parallelism are emerging: multitasking and multiple variations, where agents try different approaches to a task [11] - Users are leveraging multiple variations to test different libraries or approaches for front-end tasks like adding drag and drop functionality [11] - Jules is used to add tests with Jest and Playwright, comparing test coverage to choose the best option [4][5] - Jules is used to add a calendar link feature, accessibility audits, and improve Lighthouse scores [5][6][13] Workflow & Best Practices - AI can assist in task creation from backlogs and bug reports, as well as in merging code and handling merge conflicts [3][14] - Remote agents in the cloud offer infinite scalability and continuous connectivity, enabling development from any device [14] - A clear definition of success and a robust merge and test framework are crucial for effective parallel workflows [14][15] - Providing ample context, including documentation links, improves the agent's ability to understand and execute tasks [18]
Human seeded Evals — Samuel Colvin, Pydantic
AI Engineer· 2025-07-25 07:00
In this talk I'll introduce the concept of Human-seeded Evals, explain the principle and demo them with Pydantic Logfire. ---related links--- https://x.com/samuel_colvin https://www.linkedin.com/in/samuel-colvin/ https://github.com/samuelcolvin https://pydantic.dev/ ...
Building AI Products That Actually Work — Ben Hylak (Raindrop), Sid Bendre (Oleve)
AI Engineer· 2025-07-24 17:15
[Music] Uh my name is Ben Hilac and uh also just feeling really grateful to be with all of you guys today. Uh it's pretty exciting and we're here to talk about building AI products that actually work. Um I'll introduce this guy in a second.Sorry, it wasn't the right word. Uh, so I tweeted last night. I was kind of like, what should we uh what should we talk about today.Uh, and the overwhelming response I got was like, please no more evals. Uh, apparently there's a lot of eval tracks. We'll touch on eval sti ...
Rise of the AI Architect — Clay Bavor, Cofounder, Sierra w/ Alessio Fanelli
AI Engineer· 2025-07-24 16:45
As the amount of consumer facing AI products grows, the most forward leaning enterprises have created a new role: the AI Architect. These leaders are responsible for helping define, manage, and evolve their company's AI agent experiences over time. In this session, Clay Bavor (Cofounder of Sierra) will join Alessio Fanelli (co-host of Latent Space) in a fireside chat to share what it means to be an AI Architect, success stories from the market, and the future of the role. ---related links--- https://x.com/f ...
AI That Pays: Lessons from Revenue Cycle — Nathan Wan, Ensemble Health
AI Engineer· 2025-07-24 16:15
Healthcare Industry Challenges - 40% of hospitals operate at a negative margin due to broken revenue cycle processes [1] - Healthcare administration has increased 30-fold in the past three decades, while clinicians have barely doubled, indicating growing complexity [11][12] - 20% of the GDP is attributed to the healthcare system, with a large proportion being the administration of healthcare [9] Revenue Cycle Management (RCM) & AI Opportunity - Revenue cycle management (RCM) refers to the financial process of the patient's journey within the healthcare system [3] - AI has a big opportunity to shift resources from bureaucracy towards clinical care by addressing inefficiencies in RCM [16] - Ensemble Health Partners estimates a large amount of the cost associated with healthcare is related to friction in communication between payers, providers, and patients [14] Ensemble Health Partners' Approach - Ensemble Health Partners is an end-to-end RCM organization with 14,000 employees, providing a unique lens into inefficiencies throughout the entire process [2][3] - Ensemble Health Partners uses its EIQ platform to bring together multiple data formats within a single platform to address the challenges of unstructured data scattered across systems [33][34] - Ensemble Health Partners has seen a 40% reduction in time in the clinical appeal process by using GenAI [30] - Ensemble Health Partners aims to build a smarter, more coordinated system to reduce waste in the overall revenue cycle process [37]
Structuring a modern AI team — Denys Linkov, Wisedocs
AI Engineer· 2025-07-24 15:45
AI Team Anatomy - Companies should recognize that technology is not always the limitation to success, but rather how technology is used [1] - Companies need to identify their bottlenecks, such as shipping features, acquiring/retaining users, monetization, scalability, and reliability, to prioritize hiring accordingly [3][4] - Companies should consider whether to trade their existing team with domain knowledge for AI researchers from top labs, weighing the value of domain expertise against specialized AI skills [1] Generalists vs Specialists - Companies should structure AI teams comprehensively, recognizing that success isn't tied to a single role [2] - Companies should prioritize building a comprehensive AI team with skills in model training, model serving, and business acumen, balancing budget constraints [7] - Companies should understand the trade-offs between hiring generalists and specialists, with generalists being adaptable and specialists pushing for extra performance [18][19] Upskilling and Hiring - Companies should focus on upskilling employees in building, domain expertise, and human interaction [19] - Companies should hire based on the need to hold context and act on context, ensuring accountability for AI systems [23][24][25] - Companies should verify trends and think from first principles when hiring, considering new grads, experienced professionals, and retraining opportunities [27]
The Rise of Open Models in the Enterprise — Amir Haghighat, Baseten
AI Engineer· 2025-07-24 15:30
AI Adoption in Enterprises - Enterprises' adoption of AI is crucial for realizing AI's full potential and impact [2] - Enterprises initially experiment with OpenAI and Anthropic models, often deploying them on Azure or AWS for security and privacy [7] - In 2023, enterprises were "toying around" with AI, but by 2024, 40-50% had production use cases built on closed models [9][10] Challenges with Closed Models - Vendor lock-in is not a primary concern for enterprises due to the increasing number of interoperable models [12][13] - Ballooning costs, especially with agentic use cases involving potentially 50 inference calls per user action, are becoming a significant concern [20] - Enterprises are seeking differentiation at the AI level, not just at the workflow or application level, leading them to consider in-house solutions [21] Reasons for Open Source Model Adoption - Frontier models may not be the right tool for specific use cases, such as medical document extraction, where enterprises can leverage their labeled data to build better models [16][17] - Generic API-based models may not suffice for tasks requiring low latency, such as AI voices or AI phone calls [18] - Enterprises aim to reduce costs and improve unit economics by running models themselves and controlling pricing [20][21] Inference Infrastructure Challenges - Optimizing models for latency requires both model-level and infrastructure-level optimizations, such as speculative decoding techniques like Eagle 3 [23][24][25][26] - Guaranteeing high availability (four nines) for mission-critical inference requires robust infrastructure to handle hardware failures and VLM crashes [27][28] - Scaling up quickly to handle traffic bursts is challenging, with some enterprises experiencing delays of up to eight minutes to bring up a new replica of a model [29]