Workflow
Flipping the Inference Stack — Robert Wachen, Etched
AI Engineer·2025-08-01 14:30

Flipping the Inference Stack: Why GPUs Bottleneck Real Time AI at Scale Current AI inference systems rely on brute-force scaling—adding more GPUs for each user—creating unsustainable compute demands and spiraling costs. Real-time use cases are bottlenecked by their latency and costs per user. In this talk, AI hardware expert and founder Robert Wachen will break down why the current approach to inference is not scalable, and how rethinking hardware is the only way to unlock real-time AI at scale. ---related ...