AI Engineer

Search documents
What every AI engineer needs to know about GPUs — Charles Frye, Modal
AI Engineer· 2025-07-20 07:00
AI Engineering & GPU Utilization - AI engineering is shifting towards tighter integration and self-hosting of language models, increasing the need to understand GPU hardware [6][7] - The industry should focus on high bandwidth, not low latency, when utilizing GPUs [8] - GPUs optimize for math bandwidth over memory bandwidth, emphasizing computational operations [9] - Low precision matrix matrix multiplications are key to fully utilizing GPU potential [10] - Tensor cores, specialized for low precision matrix matrix multiplication, are crucial for efficient GPU usage [6][37] Hardware & Performance - GPUs achieve parallelism significantly exceeding CPUs, with the Nvidia H100 SXM GPU capable of over 16,000 parallel threads at 5 cents per thread, compared to AMD Epic CPU's two threads per core at approximately 1 watt per thread [20][21] - GPUs offer faster context switching compared to CPUs, happening every clock cycle [23] - Bandwidth improvement increases at the square of latency improvement, favoring bandwidth-oriented hardware [25][26] Model Optimization - Small models can be more hardware-sympathetic, potentially matching the quality of larger models with techniques like verification and multiple generations [32][33] - Multi-token prediction and multi-sample queries can become nearly "free" due to tensor core capabilities [36] - Generating multiple samples or tokens can improve performance by leveraging matrix matrix operations [39]
Brian Balfour: How Granola Beat Giants Like Zoom & Otter in the AI Note-Taking War
AI Engineer· 2025-07-20 07:00
So, let's take all of this theory and let's put it into practice. Let's talk about a product granola. Just by a raise of hands, how many people have either tried or used granola today.What they realize is actually there's a whole other set of customer needs that have been unmet, which is I don't want you to take all of my notes. I just want you to help me take better notes, empower me around this specific task and user. And that's what they built the product around.Now, because the real the realization is t ...
Robots as professional Chefs - Nikhil Abraham, CloudChef
AI Engineer· 2025-07-20 07:00
Company Overview - CloudChef 致力于使用具身人工智能重新构想烹饪方式[1] - CloudChef 正在构建机器人,以使商业厨房能够烹饪高质量的膳食,同时解决对熟练厨师的需求[1] - CloudChef 的机器人已经在多家领先的商业厨房中从事全职工作[1] Technology and Innovation - CloudChef 将一个双手动机器人改造成了一名专业厨师,该厨师可以在新的厨房工作,并通过一次演示学习新的食谱[1] Leadership and Background - CloudChef 的 CEO 是 Nikhil Abraham,他是 IIT Bombay 的校友,也是 Rephrase AI(已被 Adobe 收购)的联合创始人[1]
A Taxonomy for Next-gen Reasoning — Nathan Lambert, Allen Institute (AI2) & Interconnects.ai
AI Engineer· 2025-07-19 21:15
Model Reasoning and Applications - Reasoning unlocks new language model applications, exemplified by improved information retrieval [1] - Reasoning models are enhancing applications like website analysis and code assistance, making them more steerable and user-friendly [1] - Reasoning models are pushing the limits of task completion, requiring ongoing effort to determine what models need to continue progress [1] Planning and Training - Planning is a new frontier for language models, requiring a shift in training approaches beyond just reasoning skills [1][2] - The industry needs to develop research plans to train reasoning models that can work autonomously and have meaningful planning capabilities [1] - Calibration is crucial for products, as models tend to overthink, requiring better management of output tokens relative to problem difficulty [1] - Strategy and abstraction are key subsets of planning, enabling models to choose how to break down problems and utilize tools effectively [1] Reinforcement Learning and Compute - Reinforcement learning with verifiable rewards is a core technique, where language models generate completions and receive feedback to update weights [2] - Parallel compute enhances model robustness and exploration, but doesn't solve every problem, indicating a need for balanced approaches [3] - The industry is moving towards considering post-training as a significant portion of compute, potentially reaching parity with pre-training in GPU hours [3]
How to Train Your Agent: Building Reliable Agents with RL — Kyle Corbitt, OpenPipe
AI Engineer· 2025-07-19 21:12
Core Idea - The presentation discusses a case study on building an open-source natural language assistant (ART E) for answering questions from email inboxes using reinforcement learning [1][2][3] - The speaker shares lessons learned, what worked and didn't, and how they built an agent that worked well with reinforcement learning [2] Development Process & Strategy - The speaker recommends starting with prompted models to achieve the best performance before using any training, including reinforcement learning, to work out bugs in the environment and potentially avoid training altogether [7][8][9] - The company was able to surpass prompted model baselines with reinforcement learning, achieving a 60% reduction in errors compared to the best prompted model (03, which had 90% accuracy, while the RL model achieved 96% accuracy) [10][15] - The training of the ART E model cost approximately $80 in GPU time and one week of engineering time with an experienced engineer [23][24] Key Metrics & Optimization - The company benchmarked cost, accuracy, and latency, finding that the trained model (Quen 2.5 14B) achieved significant cost reduction compared to 03 ($55 per 1,000 searches) and 04 mini ($8 per 1,000 searches) [16][17] - The company improved latency by moving to a smaller model, training the model to have fewer turns, and considering speculative decoding [19][20][21] - The company optimized the reward function to include extra credit for fewer turns and discouraging hallucination, resulting in a significantly lower hallucination rate compared to prompted models [45][46][49][50] Challenges & Solutions - The two hard problems in using RL are figuring out a realistic environment and getting the right reward function [26][27][28] - The company created a realistic environment using the Enron email dataset, which contains 500,000 emails [33][34][35] - The company designed the reward function by having Gemini 2.5 Pro generate questions and answers from batches of emails, creating a verified dataset for the agent to learn from [37][38][39] - The company emphasizes the importance of watching out for reward hacking, where the model exploits the reward function without actually solving the problem, and suggests modifying the reward function to penalize such behavior [51][53][61]
OpenThoughts: Data Recipes for Reasoning Models — Ryan Marten, Bespoke Labs
AI Engineer· 2025-07-19 21:10
[Music] I'm Ryan. I'm a founding engineer at Bespoke Labs. And today I'm going to talk to you about Open Thoughts, which is our project to create the best open-source reasoning data sets.And I'll be switching tack a little bit from our earlier discussions on reasoning and RL and focus on the reasoning part and you'll see why. So just so we're on the same page, we've talked a lot about reasoning, but what's actually going on here. So I like this graph from JSON which shows this incredible performance that's ...
Google Photos Magic Editor: GenAI Under the Hood of a Billion-User App - Kelvin Ma, Google Photos
AI Engineer· 2025-07-19 19:00
Technology & Engineering - Google Photos' Magic Editor integrates complex CV and generative AI models into a seamless mobile experience [1] - The focus is on optimizing massive models for latency and size [1] - Crucial interplay exists with graphics rendering (OpenGL/Halide) [1] - The process involves turning research concepts into polished features for practical use [1] Product Development - The aim is to build tools that improve users' lives through greater expression, skill-building, and communication [1] Personnel - Kelvin Ma, a product engineer with 15 years of experience, is involved in developing innovative consumer applications used by millions [1]
General Intelligence is Multimodal — Keegan McCallum, Luma AI
AI Engineer· 2025-07-19 17:45
Company Overview - Luma AI 的使命是发展先进的多模态模型 [1] - Luma AI 拥有一支由研究人员和工程师组成的团队,致力于实现非传统的多模态 AGI 路径 [1] Leadership & Expertise - Keegan McCallum 是 Luma AI 的 ML 基础设施负责人,拥有在多家创业公司和工程领导岗位的经验 [1] - Keegan McCallum 的背景包括投资组合优化研究 [1] Event & Community Engagement - Keegan McCallum 在旧金山举行的 AI Engineer World's Fair 上分享了见解 [1] - Luma AI 通过时事通讯与社区保持联系 [1]
ComfyUI Full Workshop — first workshop from ComfyAnonymous himself!
AI Engineer· 2025-07-19 16:30
Overview - ComfyUI 的快速介绍以及最新内容,包括问答环节 [1] - 该内容在旧金山 AI 工程师世界博览会上录制 [1] Community Engagement - 通过加入时事通讯,及时了解即将举行的活动和内容 [1]
Design like Karpathy is watching - Zeke Sikelianos, Replicate
AI Engineer· 2025-07-19 16:15
Legendary AI engineer and educator Andrej Karpathy recently blogged about his experiences building, deploying, and monetizing a vibe-coded web app called MenuGen. Let's dig into the challenges he faced and learn what we as AI designers can do to make life better for the Andrejs of the world. About Zeke Sikelianos Zeke's been building developer tools at companies like Heroku, npm, GitHub, and Replicate for over ten years. He cares deeply about simple and tasteful developer experiences, and thinks the world o ...