benchmarking

Search documents
How fast are LLM inference engines anyway? — Charles Frye, Modal
AI Engineer· 2025-06-27 10:01
Open Model Landscape & Benchmarking - Open-weight models are catching up to Frontier Labs in capabilities, making many AI Engineer applications possible that weren't before [1] - Open-source engines like VLM, SGLang, and Tensor TLM are readily available, reducing the need for custom model implementations [1] - Modal has created a public benchmark (modal.com/llmalmanac) for comparing the performance of different models and engines across various context lengths [2][3] Performance Analysis - Throughput is significantly higher when processing longer input contexts (prefill) compared to generating longer output sequences (decode), with up to a 4x improvement observed [15][16] - The time to first token (latency) remains nearly constant even with a 10x increase in input tokens, suggesting a "free lunch" by prioritizing context over reasoning [19] - Gemma 7B models show roughly the same throughput as Qwen 3 models, despite being 10x smaller in model weights, indicating optimization differences [12] Optimization & Infrastructure - Scaling out (adding more GPUs) is the primary method for increasing total throughput, rather than scaling up (optimizing a single GPU) [23] - Benchmarking methodology involves sending a thousand requests to determine maximum throughput and sending single requests to determine fastest possible server run time [24][25] - BF16 precision offers slower tensor core support compared to FP8 or FP4, suggesting potential for even greater performance gains with lower precision formats on newer hardware like Blackwell [16][17]
Becoming The Benchmark | Asst. Prof. Dr. Vildan Esenyel | TEDxBAU Cyprus
TEDx Talks· 2025-06-25 15:55
My first time indeed I don't know if you have been before. This is the moment is the real moment. Silence you are at the center of the stage.All eyes are looking at you spotlight your heart is racing. Your hands are shaking. And you think I'm talking about TEDX.Maybe one day without noticing areping up stage of life right without noticing every day constantly we are going we are going to that meeting we are speaking up in the class. We are just smiling in the zoom meeting without even realizing. rept Nobody ...