Workflow
英伟达的GB200,怎么部署?
半导体行业观察·2024-07-18 01:24

Nvidia GB200 Hardware Architecture - Nvidia's GB200 offers significant performance improvements through advanced hardware architecture, but deployment complexity has increased significantly [2] - The GB200 rack comes in 4 main form factors, each customizable to meet different deployment needs [6] - The GB200 NVL72 form factor requires approximately 120kW per rack, necessitating liquid cooling due to its high power density [3] - The NVL36x2 form factor, with a power and cooling density of 66kW per rack, is expected to be more widely deployed due to its lower power requirements compared to NVL72 [9] - The GB200 NVL72 rack consists of 18 1U compute trays and 9 NVSwitch trays, with each compute tray containing 2 Bianca boards, each with 1 Grace CPU and 2 Blackwell GPUs [14] - The NVL36x2 form factor connects two racks side by side, maintaining non-blocking communication between all 72 GPUs across the two racks [14] - A custom "Ariel" board variant is expected to be primarily used by Meta for its recommendation system workloads, which require higher CPU core and memory ratios per GPU [15] Power and Cooling - The GB200 NVL72 rack has a total power consumption of 123.6kW, including inefficiencies from AC to DC conversion [13] - The NVL36x2 form factor consumes approximately 132kW for two racks, which is about 10kW more than the NVL72 due to additional NVSwitch ASICs and inter-rack cabling [9] - Liquid cooling is essential for GB200 due to its high power density, with the NVL72 form factor requiring 120kW per rack, far exceeding the 40kW limit of traditional air-cooled racks [3] NVLink Interconnect - The NVLink interconnect in GB200 allows for 900GB/s bidirectional bandwidth per GPU, connecting 72 GPUs in the NVL72 form factor [33] - The NVL72 retains a flat 1-hop NVLink topology, enabling direct communication between any GPU within the same rack via NVSwitch [34] - The NVL36x2 form factor requires 2 NVSwitch hops for communication between GPUs in different racks, slightly increasing latency but remaining practical for training workloads [52] - The cost of NVLink interconnects is primarily driven by connectors rather than cables, with Amphenol's Ultrapass Paladin being the primary supplier for NVLink backplane interconnects [53] Supply Chain and Market Impact - The GB200's deployment complexity has led to a redesign of the supply chain, affecting data center deployers, cloud providers, server OEMs/ODMs, and downstream component suppliers [2] - The shift to GB200 racks has significantly increased the demand for advanced 1.6T copper cables and connectors, benefiting suppliers like Amphenol [38] - The NVL36x2 form factor, despite being more expensive due to additional NVSwitch ASICs and cabling, is expected to be the preferred choice for most companies due to power and cooling limitations [37] Future Developments - In Q2 2025, Nvidia plans to release the B200 NVL72 and NVL36x2 variants using x86 CPUs instead of Grace CPUs, which will reduce upfront capital costs but may result in lower revenue for Nvidia [16] - The x86 CPU variant, known as Miranda, will have lower CPU-to-GPU bandwidth compared to the Grace CPU version, potentially impacting total cost of ownership (TCO) [16] Networking and Connectivity - The GB200 system supports four types of networks: front-end, back-end, NVLink, and out-of-band management [25] - The front-end network typically provides 200-800Gb/s per server, depending on the configuration [50] - The back-end network, used for GPU-GPU communication across racks, can be based on Nvidia's InfiniBand, Spectrum-X Ethernet, or Broadcom Ethernet solutions [92] - The NVLink interconnect is significantly faster than the back-end network, offering 8-10 times the bandwidth [33] Cost Analysis - The cost of NVLink interconnects is lower than expected, with most of the cost coming from connectors rather than cables [53] - The NVL36x2 form factor requires additional 1.6T twin-port OSFP ACC cables, increasing the cost by over 10,000persystem[107]TheNVL576variant,whichconnects576GPUs,wouldrequirefiberopticsforlongdistanceconnections,significantlyincreasingcoststoover10,000 per system [107] - The NVL576 variant, which connects 576 GPUs, would require fiber optics for long-distance connections, significantly increasing costs to over 5.6 million per system [109] Customization and Deployment - Meta is expected to deploy the custom "Ariel" board variant for its largest recommendation system workloads, while using the standard NVL36x2 for GenAI workloads [11] - Most companies will deploy the NVL36x2 form factor due to its lower power and cooling requirements, despite the higher cost of NVSwitch ASICs and cabling [37] - The GB200's reference design includes over-provisioning for worst-case scenarios, which may not be necessary for most customers, leading to potential cost savings [67]