Workflow
LLM Inference
icon
Search documents
How fast are LLM inference engines anyway? — Charles Frye, Modal
AI Engineer· 2025-06-27 10:01
[Music] Thanks everybody for coming. Um, yeah, I wanted to talk about some work I've done recently on trying to figure out uh just how fast these inference engines are when you run open models on them. Uh so the kind of been talking at AI engineer since it was AI engineer summit two years ago. Um and the for a long time it's basically been the like OpenAI rapper conference, right? It's like because just because yeah, what am I going to do? Am I going to run an agent with BERT? Probably not. Um, and that was ...
LLM Inference 和 LLM Serving 视角下的 MCP
AI前线· 2025-05-16 07:48
LLM Inference 自从 ChatGPT 问世以后,LLM 相关技术对人工智能技术领域形成了冲击性的影响,许多围绕 LLM 的技术架构的发展也一直在如火如荼的展开,比如 RAG 和 AI-Agent,以及时下比较火爆的 Model Context Protocol (MCP)[1]。在展开之前结合行业现实,笔者认为解释清楚 LLM Inference(LLM 推 理)和 LLM Serving(LLM 服务)的概念是十分必要的。 事实上,由于行业的快速发展,许多概念和知识点一直在业界混淆不清,比如对于 LLM Inference 和 LLM Serving 两个概念我相信不少人都是相当不清晰的。笔者认为造成这些问题的主要原因之一是在 LLM 的工程实践过程中将其所负责的功能范畴相互交错导致的。简单来说,为了满足业务需求很多 LLM 相关的技术框架不得已将 LLM Inference 和 LLM Serving 的功能集合都实现成在一起,导致功 能集合的边界模糊不清。因此,除了从 Inference 和 LLM Serving 的角度去谈 MCP 的发展,解释清 楚此两者的概念范畴同样也是本文的主要目的 ...