Workflow
How fast are LLM inference engines anyway? — Charles Frye, Modal
AI Engineer·2025-06-27 10:01

[Music] Thanks everybody for coming. Um, yeah, I wanted to talk about some work I've done recently on trying to figure out uh just how fast these inference engines are when you run open models on them. Uh so the kind of been talking at AI engineer since it was AI engineer summit two years ago. Um and the for a long time it's basically been the like OpenAI rapper conference, right? It's like because just because yeah, what am I going to do? Am I going to run an agent with BERT? Probably not. Um, and that was ...