MI355X)

Search documents
SemiAnalysis--为什么除了CSP,几乎没人用AMD的GPU?
傅里叶的猫· 2025-05-23 15:46
Core Viewpoint - The article provides a comprehensive analysis comparing the inference performance, total cost of ownership (TCO), and market dynamics of NVIDIA and AMD GPUs, highlighting why AMD products are less utilized outside of large-scale cloud service providers [1][2]. Testing Background and Objectives - The research team conducted a six-month analysis to validate claims that AMD's AI servers outperform NVIDIA in TCO and inference performance, revealing complex results across different workloads [2][5]. Performance Comparison - For customers using vLLM/SGLang, the performance cost ratio (perf/$) of single-node H200 deployments is sometimes superior, while MI325X can outperform depending on workload and latency requirements [5]. - In most scenarios, MI300X lacks competitiveness against H200, but it outperforms H100 for specific models like Llama3 405B and DeepSeekv3 670B [5]. - For short-term GPU rentals, NVIDIA consistently offers better cost performance due to a larger number of rental providers, while AMD's offerings are limited, leading to higher prices [5][26]. Total Cost of Ownership (TCO) Analysis - AMD's MI300X and MI325X GPUs generally have lower hourly costs compared to NVIDIA's H100 and H200, with MI300X costing $1.34 per hour and MI325X costing $1.53 per hour [21]. - The capital cost constitutes a significant portion of the total cost, with MI300X having a capital cost share of 70.5% [21]. Market Dynamics - AMD's market share in the AI GPU sector has been growing steadily, but it is expected to decline in early 2025 due to NVIDIA's Blackwell series launch, while AMD's response products will not be available until later [7]. - The rental market for AMD GPUs is constrained, with few providers, leading to artificially high prices and reduced competitiveness compared to NVIDIA [26][30]. Benchmark Testing Methodology - The benchmark testing focused on real-world inference workloads, measuring throughput and latency under various user loads, which differs from traditional offline benchmarks [10][11]. - The testing included a variety of input/output token lengths to assess performance across different inference scenarios [11][12]. Benchmark Results - In tests with Llama3 70B FP16, MI325X and MI300X outperformed all other GPUs in low-latency scenarios, while H200 showed superior performance in high-concurrency situations [15][16]. - For Llama3 405B FP8, MI325X consistently demonstrated better performance than H100 and H200 in various latency conditions, particularly in high-latency scenarios [17][24]. Conclusion on AMD's Market Position - The article concludes that AMD needs to lower rental prices to compete effectively with NVIDIA in the GPU rental market, as current pricing structures hinder its competitiveness [26][30].