AAI 2025: Enterprise AI Inference – An Uber™ Success Story

AI Workloads & AMD's Solutions - AI workloads are classified into five buckets: traditional machine learning, recommendation systems, language models and generative AI, and mixed AI enabled workloads [7] - AMD offers both GPUs and CPUs to cover the span of all enterprise AI needs, supported through an open ecosystem [11] - AMD's 5 GHz EPYC processor is purpose-built as a host processor for AI accelerators, leveraging the x86 ecosystem for broad software support and flexibility [13][14] - AMD EPYC CPUs lead with 64 cores and 5 GHz operation, suitable for robust enterprise-class workloads [15] Performance & Efficiency - AMD EPYC CPUs demonstrate a 7% to 13% performance boost compared to Xeon when used as a host processor for GPUs [17] - For generative workloads, AMD EPYC CPUs show a 28% to 33% improvement compared to the competition [24] - For natural language workloads, AMD EPYC CPUs outperform the latest generation competition by 20% to 36% [25] - AMD EPYC processors are built for low power, low cost AI inference, offering fast inference, easy integration, and the ability to add AI workloads without adding significant power consumption [28] Uber's Use Case - Uber handles 33 million trips daily, serving 170 million monthly active users, requiring a robust technology stack [30] - Uber began its cloud migration journey with GCP and OCI in late 2022, focused on accelerating innovation and optimizing costs [33] - Uber is migrating more workloads to AMD CPUs in a multi-cloud environment, leveraging next-gen technologies like PCI Gen 6 and CXL [37] - Uber expects over 30% better SPECjbb2015-perf per dollar with GCP C40 chips based on Turin architecture compared to CKD [38]