AMD Instinct GPU MI325X

Search documents
推理芯片:英伟达第一,AMD第二
半导体行业观察· 2025-04-03 01:23
Core Viewpoint - The latest MLCommons machine learning benchmark results indicate that Nvidia's new Blackwell GPU architecture outperforms all other computers, while AMD's latest Instinct GPU MI325 competes closely with Nvidia's H200 [1][3][10]. Benchmark Testing - MLPerf has introduced three new benchmark tests to better reflect the rapid advancements in machine learning, bringing the total to 11 server benchmarks [1][11]. - The new benchmarks include two large language models (LLMs), with the Llama2 70B being a mature benchmark and the new "Llama2-70B Interactive" requiring computers to generate at least 25 tokens per second and respond within 450 milliseconds [2][12]. Performance Insights - Nvidia continues to dominate MLPerf benchmarks through submissions from itself and 15 partners, with its Blackwell architecture GPU B200 being the fastest, outperforming the previous Hopper architecture [8][14]. - The B200 GPU features 36% more high-bandwidth memory than the H200 and can perform critical machine learning operations with precision as low as 4 bits, enhancing AI computation speed [8][14]. Comparative Performance - In the Llama3.1 405B benchmark, Supermicro's 8-core B200 system achieved nearly four times the token throughput of Cisco's 8-core H200 system [15]. - The fastest system reported in this round of MLPerf is Nvidia's B200 server, delivering 98,443 tokens per second [15]. AMD's Position - AMD's latest Instinct GPU MI325X is positioned to compete with Nvidia's H200, featuring increased high-bandwidth memory and bandwidth [15][17]. - In Llama2 70B tests, the MI325X system's speed is comparable to the H200, with only a 3% to 7% difference [17]. Intel and Other Competitors - Intel's Xeon 6 chips showed significant performance improvements, achieving about 80% better results compared to previous models, although Intel appears to be stepping back from the AI accelerator chip competition [18]. - Google's TPU v6e chips also performed well, achieving a 2.5 times improvement over their predecessors, although their performance is roughly equivalent to Nvidia's H100 in similar configurations [18].