AI Engineer

Search documents
Serving Voice AI at $1/hr: Open-source, LoRAs, Latency, Load Balancing - Neil+Jack Dwyer, Gabber
AI Engineer· 2025-07-31 13:45
This is a talk that goes over our experience deploying Orpheus (Emotive, Realtime TTS) to production. It will cover topics: - Latency and optimizations - High fidelity voice clones w/ examples - Load balancing w/ multiple GPUs and multiple LoRas About Neil Dwyer Spent a lot of my career building real-time applications. First at a company called Bebo circa 2018 where I built a live streaming + computer vision pipeline that watched people play Fortnite. More recently at a company called LiveKit where I worked ...
How to defend your sites from AI bots — David Mytton, Arcjet
AI Engineer· 2025-07-30 17:30
Constantly seeing CAPTCHAs? It used to be easy to detect the humans from the droids, but what else can we do when synthetic clients make up nearly half of all web requests. Rotating IPs, spoofed browsers, and agents acting on behalf of real users - are we doomed to forever be solving puzzles? In this talk, we’ll explore user agents, HTTP fingerprints, and IP reputation signals that make humans and agents stand out from scrapers, build a realistic threat model, and dig into the behaviors that reveal the LLM- ...
The Unofficial Guide to Apple’s Private Cloud Compute - Jonathan Mortensen, CONFSEC
AI Engineer· 2025-07-30 17:00
In October 2024, Apple released a new private AI technology onto millions of devices called “Private Cloud Compute”. It brings the same level of privacy and security a local device offers but on an “untrusted" remote server. This talk discusses how Private Cloud Compute represents a paradigm shift in confidential computing and explores the core advancements that made it possible to become mainstream. We’ll explore its novel architecture that allows developers to run sensitive, multi-tenant workloads with cr ...
How we hacked YC Spring 2025 batch’s AI agents — Rene Brandel, Casco
AI Engineer· 2025-07-30 15:45
We hacked 7 of the16 publicly-accessible YC X25 AI agents. This allowed us to leak user data, execute code remotely, and take over databases. All within 30 minutes each. In this session, we'll walk through the common mistakes these companies made and how you can mitigate these security concerns before your agents put your business at risk. Recorded at the AI Engineer World's Fair in San Francisco. Stay up to date on our upcoming events and content by joining our newsletter here: https://www.ai.engineer/news ...
Scaling Enterprise-Grade RAG: Lessons from Legal Frontier - Calvin Qi (Harvey), Chang She (Lance)
AI Engineer· 2025-07-29 16:00
[Music] All right. Uh, thank you everyone. We're excited for to be here and thank you for uh, coming to our talk.Uh, my name is Chong. I'm the CEO and co-founder of LANCB. I've been making data tools for machine learning and data science for about 20 years.I was one of the co-authors of pandas library and I'm working on LANCB today for all of that data that doesn't fit neatly into those pandas data frames. And I'm Calvin. I lead one of the teams at Harvey Aai working on rag um tough rag problems across mass ...
Building Alice’s Brain: an AI Sales Rep that Learns Like a Human - Sherwood & Satwik, 11x
AI Engineer· 2025-07-29 15:30
[Music] Okay, thanks everyone for coming today. Uh, so today's talk is called Building Alice's Brain. How we built an AI sales rep that learns like a human.Uh, my name is Sherwood. I am one of the tech leads here at 11X. I lead engineering for our Alice product and I'm joined by my colleague Saw.So 11X for those of you who are unfamiliar is a company that's building digital workers for the go to market organization. We have two digital workers today. We have Alice who is our AI SDR and then we also have Jul ...
Layering every technique in RAG, one query at a time - David Karam, Pi Labs (fmr. Google Search)
AI Engineer· 2025-07-29 14:30
Start with the simplest Search - in-memory embeddings with relevance ranking. End with the most complex planet-scale Search - 70+ corpus mix of token, embeddings, and knowledge graphs, all jointly retrieved, custom ranked, joint re-ranked, and then LLM-processed, at 160,000 queries per second in under 200msec. This talk will be a fun “one query at a time” survey of all techniques in RAG in incremental complexity, showing the limits of each technique and what the next layered one opens up in terms of capabil ...
Building a Smarter AI Agent with Neural RAG - Will Bryk, Exa.ai
AI Engineer· 2025-07-29 07:01
Core Problem & Solution - The presentation introduces Exa, a search engine designed for AI, addressing the limitations of traditional search engines built for human users [5][23] - Exa aims to provide an API that delivers any information from the web, catering to the specific needs of AI systems [22][41] - Exa uses transformer-based embeddings to represent documents, capturing meaning and context beyond keywords [11][12] AI vs Human Search - Traditional search engines are optimized for humans who use simple queries and want a few relevant links, while AIs require complex queries, vast amounts of knowledge, and precise, controllable information [23][24] - AI agents need search engines that can handle multi-paragraph queries, search with extensive context, and provide comprehensive knowledge [31][32][33] - Exa offers features like adjustable result numbers (10, 100, 1000), date ranges, and domain-specific searches, giving AI systems full control [44] Market Positioning & Technology - Exa launched in November 2022 and gained traction for its ability to handle complex queries that traditional search engines struggle with [15] - The company recognized the need for AI-driven search after the emergence of ChatGPT, realizing that LLMs need external knowledge sources [17][18] - Exa combines neural and keyword search methods to provide comprehensive results, allowing agents to use different search types based on the query [47][48] Future Development - Exa is developing a "research endpoint" that uses multiple searches and LLM calls to generate detailed reports and structured outputs [51] - The company envisions a future where AI agents have full access to the world's information through a versatile search API [48] - Exa aims to handle a wider range of queries, including semantic and complex ones, turning the web into a controllable database for AI systems [38][39][40]
Make your LLM app a Domain Expert: How to Build an Expert System — Christopher Lovejoy, Anterior
AI Engineer· 2025-07-28 19:55
Core Problem & Solution - Vertical AI applications face a "last mile problem" in understanding industry-specific context and workflows, which is more critical than model sophistication [4][6] - Anterior proposes an "adaptive domain intelligence engine" to convert customer-specific domain insights into performance improvements [17] - The engine consists of measurement (performance evaluation) and improvement (iterative refinement) components [17] Measurement & Metrics - Defining key performance metrics that users care about is crucial, such as minimizing false approvals in healthcare or preventing dollar loss from fraud [18][19][20] - Developing a failure mode ontology helps categorize and analyze different ways the AI can fail, enabling targeted improvements [21][22] - Combining metric tracking with failure mode analysis allows prioritization of development efforts based on the impact on key metrics [26][27] Iteration & Improvement - Failure mode labeling creates ready-made datasets for iterative model improvement, using production data to ensure relevance [29] - Domain experts can suggest changes to the application pipeline and provide new domain knowledge to enhance performance [32][33] - This process enables rapid iteration, potentially fixing issues the same day by adding relevant domain knowledge and validating with evals [37] Domain Expertise - The level of domain expertise required depends on the specific workflow and optimization goals, with clinical reasoning requiring experienced doctors [38][39] - Bespoke tooling is recommended for integrating domain expert feedback into the platform and workflows [41] - Domain expert reviews provide performance metrics, failure modes, and suggested improvements, all in one [38] Results & Performance - Anterior achieved a 95% accuracy baseline in approving care requests, which was further improved to 99% through iterative refinement using the described system [14][15]
Shipping something to someone always wins — Kenneth Auchenberg (ex. Stripe, VSCode)
AI Engineer· 2025-07-28 19:54
Core Product Development Principle - Shipping something to someone always wins, emphasizing rapid iteration and feedback loops over big launches [1][34] - The key is enabling rapid iterative loops to get feedback from real users and maximize shots at the goal [1] - In the age of AI, this translates to building a "skateboard" first, then evolving it to a "car," ensuring a continuously viable product [2][4] - A continuously viable solution is significantly more valuable because it provides feedback along the way, avoiding building in a vacuum [5][6] Feedback Loop Implementation - Establish a feedback loop with real users who can see something, provide feedback, and allow for iterative improvements, ideally within a day [7] - Being able to ship every day is crucial for a fast feedback loop, requiring specific focus on the target customers [9] - Work with real people (not just personas) to understand their problems and build empathy [10][11] - Write the PI (Product Information) FAQ or launch blog post early to sanity check and communicate the product effectively [12] Navigating Constraints and AI Integration - Design the best product first, before considering constraints like legal, compliance, and financial aspects [15] - AI accelerates all aspects of product building, but the fundamental process of talking to users and getting feedback remains the same [26] - Product management becomes more critical as the cost of writing code approaches zero, emphasizing customer knowledge and rapid feedback [28][29]