SemiAnalysis：AMD vs NVIDIA 推理基准测试：谁赢了？--性能与每百万令牌成本分析

Summary of AMD vs NVIDIA Inference Benchmarking Conference Call Industry and Companies Involved - Industry: Artificial Intelligence (AI) Inference Solutions - Companies: Advanced Micro Devices (AMD) and NVIDIA Core Insights and Arguments 1. Performance Comparison: AMD's AI servers have been claimed to provide better inference performance per total cost of ownership (TCO) than NVIDIA, but results show nuanced performance differences across various tasks such as chat applications, document processing, and reasoning [4][5][6] 2. Workload Performance: For hyperscalers and enterprises owning GPUs, NVIDIA outperforms AMD in some workloads, while AMD excels in others. However, for short to medium-term rentals, NVIDIA consistently offers better performance per dollar due to a lack of AMD GPU rental providers [6][12][13] 3. Market Dynamics: The M25X, intended to compete with NVIDIA's H200, faced shipment delays, leading customers to choose the B200 instead. The M55X is expected to ship later in 2025, further impacting AMD's competitive position [8][10][24] 4. Software and Developer Experience: AMD's software support for its GPUs is still lacking compared to NVIDIA's, particularly in terms of developer experience and continuous integration (CI) coverage. This has contributed to AMD's ongoing challenges in the AI software space [9][15][14] 5. Market Share Trends: AMD's market share in Datacenter A GPUs has been increasing but is expected to decline in Q2 CY2025 due to NVIDIA's new product launches. However, AMD's upcoming M55X and software improvements may help regain some market share [26][27] Additional Important Points 1. Benchmarking Methodology: The benchmarking methodology emphasizes online throughput against end-to-end latency, providing a realistic assessment of performance under operational conditions [30][31] 2. Latency and Throughput Relationship: There is a trade-off between throughput and latency; optimizing for one often negatively impacts the other. Understanding this balance is crucial for selecting the right configuration for different applications [35][36] 3. Inference Engine Selection: vLLM is the primary inference engine for benchmarking, while TensorRT-LLM (TRT-LLM) is also evaluated. Despite improvements, TRT-LLM still lags behind vLLM in user experience [54][55] 4. Future Developments: AMD is encouraged to increase investment in internal cluster resources to improve developer experience and software capabilities, which could lead to better long-term shareholder returns [15] This summary captures the key insights and arguments presented during the conference call, highlighting the competitive landscape between AMD and NVIDIA in the AI inference market.