AI Engineer - filings, earnings calls, financial reports, news

AI Engineer

Search documents

Excalidraw: AI and Human Whiteboarding Partnership - Christopher Chedeau

AI Engineer· 2025-07-21 19:12

[Music] Thank you so much for the intro. I'm so excited to be here uh talking about like figure out like how do we like AI and human like work in the world of white bowling and I built excro and if you've don't know about it like you'll see like many thing about it and one of the expectation you probably have uh about speaker at the AI engineer conference is that I talk about AI on every single sentence for the entire talk. So I'm just going to give you a warning.I'm only going to do it for the second half ...

Agentic GraphRAG: AI’s Logical Edge — Stephen Chin, Neo4j

AI Engineer· 2025-07-21 17:15

Core Idea - AI models are increasingly used for complex, industry-specific tasks, where different retrieval approaches offer varying advantages in accuracy, explainability, and cost [1] - GraphRAG retrieval models are a powerful tool for solving domain-specific problems requiring logical reasoning and correlation aided by graph relationships and proximity algorithms [1] - An agent architecture combining RAG and GraphRAG retrieval patterns can bridge the gap in data analysis, strategic planning, and retrieval to solve complex domain-specific problems [1] Technology & Architecture - The architecture combines RAG (Retrieval-Augmented Generation) and GraphRAG retrieval patterns [1]

Good design hasn’t changed with AI — John Pham, SF Compute

AI Engineer· 2025-07-21 16:30

Bad designs are still bad. AI doesn’t make it good. The novelty of AI makes the bad things tolerable, for a short time. Building great designs and experiences with AI have the same first principles pre-AI. When people use software, they want it to feel responsive, safe, accessible and delightful. We’ll go over the big and small details that goes into software that people want to use, not forced to use. About John Pham I'm John Pham, an engineer and a self-taught designer. I seek the dopamine hits of buildin ...

Artificial Intelligence

Design

Artificial Intelligence

Design

Building Effective Voice Agents — Toki Sherbakov + Anoop Kotha, OpenAI

AI Engineer· 2025-07-20 16:30

Overview - The document discusses building production voice applications [1] - It shares learnings from working with customers in the voice application domain [1] Authorship - The content is associated with tokisherbakov (Twitter handle) and akotha7 (LinkedIn profile) [1]

Voice Agents

What every AI engineer needs to know about GPUs — Charles Frye, Modal

AI Engineer· 2025-07-20 07:00

AI Engineering & GPU Utilization - AI engineering is shifting towards tighter integration and self-hosting of language models, increasing the need to understand GPU hardware [6][7] - The industry should focus on high bandwidth, not low latency, when utilizing GPUs [8] - GPUs optimize for math bandwidth over memory bandwidth, emphasizing computational operations [9] - Low precision matrix matrix multiplications are key to fully utilizing GPU potential [10] - Tensor cores, specialized for low precision matrix matrix multiplication, are crucial for efficient GPU usage [6][37] Hardware & Performance - GPUs achieve parallelism significantly exceeding CPUs, with the Nvidia H100 SXM GPU capable of over 16,000 parallel threads at 5 cents per thread, compared to AMD Epic CPU's two threads per core at approximately 1 watt per thread [20][21] - GPUs offer faster context switching compared to CPUs, happening every clock cycle [23] - Bandwidth improvement increases at the square of latency improvement, favoring bandwidth-oriented hardware [25][26] Model Optimization - Small models can be more hardware-sympathetic, potentially matching the quality of larger models with techniques like verification and multiple generations [32][33] - Multi-token prediction and multi-sample queries can become nearly "free" due to tensor core capabilities [36] - Generating multiple samples or tokens can improve performance by leveraging matrix matrix operations [39]

GPUs

Artificial Intelligence

Tensor Cores

Bandwidth

Latency

Matrix Multiplication

GPUs

Artificial Intelligence

Tensor Cores

Bandwidth

Latency

Matrix Multiplication

Brian Balfour: How Granola Beat Giants Like Zoom & Otter in the AI Note-Taking War

AI Engineer· 2025-07-20 07:00

Product Development & Strategy - Granola initially focused on unmet customer needs related to note-taking, specifically empowering users to take better notes [1] - Granola is evolving its product by creating project and team workspaces to enable further collaboration [2] - The company emphasizes the importance of continuous product iteration, building upon initial features to create new functionalities [2] User Adoption - Many individuals have experience with or currently use granola products [1]

Robots as professional Chefs - Nikhil Abraham, CloudChef

AI Engineer· 2025-07-20 07:00

Company Overview - CloudChef 致力于使用具身人工智能重新构想烹饪方式[1] - CloudChef 正在构建机器人，以使商业厨房能够烹饪高质量的膳食，同时解决对熟练厨师的需求[1] - CloudChef 的机器人已经在多家领先的商业厨房中从事全职工作[1] Technology and Innovation - CloudChef 将一个双手动机器人改造成了一名专业厨师，该厨师可以在新的厨房工作，并通过一次演示学习新的食谱[1] Leadership and Background - CloudChef 的 CEO 是 Nikhil Abraham，他是 IIT Bombay 的校友，也是 Rephrase AI（已被 Adobe 收购）的联合创始人[1]

A Taxonomy for Next-gen Reasoning — Nathan Lambert, Allen Institute (AI2) & Interconnects.ai

AI Engineer· 2025-07-19 21:15

Model Reasoning and Applications - Reasoning unlocks new language model applications, exemplified by improved information retrieval [1] - Reasoning models are enhancing applications like website analysis and code assistance, making them more steerable and user-friendly [1] - Reasoning models are pushing the limits of task completion, requiring ongoing effort to determine what models need to continue progress [1] Planning and Training - Planning is a new frontier for language models, requiring a shift in training approaches beyond just reasoning skills [1][2] - The industry needs to develop research plans to train reasoning models that can work autonomously and have meaningful planning capabilities [1] - Calibration is crucial for products, as models tend to overthink, requiring better management of output tokens relative to problem difficulty [1] - Strategy and abstraction are key subsets of planning, enabling models to choose how to break down problems and utilize tools effectively [1] Reinforcement Learning and Compute - Reinforcement learning with verifiable rewards is a core technique, where language models generate completions and receive feedback to update weights [2] - Parallel compute enhances model robustness and exploration, but doesn't solve every problem, indicating a need for balanced approaches [3] - The industry is moving towards considering post-training as a significant portion of compute, potentially reaching parity with pre-training in GPU hours [3]

Reasoning Models

Planning

Reinforcement Learning

Post-training

Language Model Applications

Calibration

Reasoning Models

Planning

Reinforcement Learning

Post-training

Language Model Applications

Calibration

How to Train Your Agent: Building Reliable Agents with RL — Kyle Corbitt, OpenPipe

AI Engineer· 2025-07-19 21:12

Core Idea - The presentation discusses a case study on building an open-source natural language assistant (ART E) for answering questions from email inboxes using reinforcement learning [1][2][3] - The speaker shares lessons learned, what worked and didn't, and how they built an agent that worked well with reinforcement learning [2] Development Process & Strategy - The speaker recommends starting with prompted models to achieve the best performance before using any training, including reinforcement learning, to work out bugs in the environment and potentially avoid training altogether [7][8][9] - The company was able to surpass prompted model baselines with reinforcement learning, achieving a 60% reduction in errors compared to the best prompted model (03, which had 90% accuracy, while the RL model achieved 96% accuracy) [10][15] - The training of the ART E model cost approximately $80 in GPU time and one week of engineering time with an experienced engineer [23][24] Key Metrics & Optimization - The company benchmarked cost, accuracy, and latency, finding that the trained model (Quen 2.5 14B) achieved significant cost reduction compared to 03 ($55 per 1,000 searches) and 04 mini ($8 per 1,000 searches) [16][17] - The company improved latency by moving to a smaller model, training the model to have fewer turns, and considering speculative decoding [19][20][21] - The company optimized the reward function to include extra credit for fewer turns and discouraging hallucination, resulting in a significantly lower hallucination rate compared to prompted models [45][46][49][50] Challenges & Solutions - The two hard problems in using RL are figuring out a realistic environment and getting the right reward function [26][27][28] - The company created a realistic environment using the Enron email dataset, which contains 500,000 emails [33][34][35] - The company designed the reward function by having Gemini 2.5 Pro generate questions and answers from batches of emails, creating a verified dataset for the agent to learn from [37][38][39] - The company emphasizes the importance of watching out for reward hacking, where the model exploits the reward function without actually solving the problem, and suggests modifying the reward function to penalize such behavior [51][53][61]

Reinforcement Learning

Reinforcement Learning

OpenThoughts: Data Recipes for Reasoning Models — Ryan Marten, Bespoke Labs

AI Engineer· 2025-07-19 21:10

Open Thoughts项目概览 - Bespoke Labs 发布 Open Thoughts 3，旨在创建最佳的开源推理数据集 [1][9] - Open Thoughts 项目专注于推理数据配方，以解决创建强大推理模型的关键缺失环节 [6][9] - Open Thoughts 3 在科学、代码和数学等领域都优于 Deepseek R1 quen 7B 模型 [13] 数据集创建与优化 - 数据集流水线包括问题来源、混合、过滤、答案生成和答案过滤等步骤 [17] - 实验创建了超过 5000 个数据集和近 3000 个模型，以严格评估流水线中每个步骤的不同决策 [18] - 每个问题采样多个推理轨迹效果显著，在固定问题规模下，性能不会下降，允许数据规模扩大 16 倍 [19][20] - 合成问题是可扩展的，可以进一步提高准确性 [22] - 问题过滤通过让语言模型评估问题的难度和答案的长度来筛选高质量问题 [23] 关键学习与发现 - 少量高质量的数据来源优于大量多样性的数据来源 [25] - 对于 SFT 和知识蒸馏，基于答案过滤或验证答案似乎没有帮助 [26] - 较强的评估基准模型并不一定意味着它是一个更好的教师模型，例如，Quen 32B 是比 Deepseek R1 更好的教师模型 [21] - 通过知识蒸馏，模型可以在某些领域超越教师模型，例如在法律推理领域 [35][36][37] 实践建议 - 根据特定领域调整数据配方，从 Open Thoughts 的配方开始迭代 [29] - 针对代码、科学和数学等不同领域，应区别研究流水线的每个步骤 [29][30] - 如果特定领域的数据不足，可以将现有数据转换为问题，并使用上下文示例生成更多数据 [32] - 评估至关重要，需要使用 Evalchemy 等开源库来确保模型改进的有效性 [33][34]