服务器网络架构

Search documents
被抛弃的NVL72光互联方案
傅里叶的猫· 2025-07-17 15:41
Core Viewpoint - The article discusses the architecture and networking components of the GB200 server, focusing on the use of copper and optical connections, and highlights the flexibility and cost considerations in the design choices made by different customers [1][2]. Frontend Networking - The frontend networking in the GB200 architecture serves as the main channel for external data exchange, connecting to the internet and cluster management tools [1]. - Each GPU typically receives a bandwidth of 25-50Gb/s, with total frontend network bandwidth for the HGX H100 server ranging from 200-400Gb/s, while GB200 can reach 200-800Gb/s depending on configuration [2]. - Nvidia's reference design for frontend networking may be over-provisioned, leading to higher costs for customers who may not need such high bandwidth [2][4]. Backend Networking - The backend networking supports GPU-to-GPU communication across large-scale clusters, focusing on internal computational collaboration [5]. - Various switch options are available for the backend network, with initial shipments using ConnectX-7 cards and future upgrades planned for ConnectX-8 [6][10]. - Long-distance interconnections primarily utilize optical cables due to the limitations of copper cables over longer distances [6]. Accelerator Interconnect - The accelerator interconnect is designed for high-speed communication between GPUs, significantly impacting communication efficiency and system scalability [13]. - The GB200's NVLink interconnect has evolved from the HGX H100, requiring external connections due to the separation of NVSwitches and GPUs across different trays [14]. - Different configurations (NVL72, NVL36x2, NVL576) balance communication efficiency and scalability, with NVL72 being optimal for low-latency scenarios [15]. Out of Band Networking - The out-of-band networking is dedicated to device management and monitoring, focusing on system maintenance rather than data transmission [20]. - It connects various IT devices through baseboard management controllers (BMC), allowing for remote management and monitoring of system health [21]. Cost Analysis of MPO Connectors - The article estimates the value of MPO connectors in the GB200 server, indicating that the cost per GPU can vary significantly based on network architecture and optical module usage [22][23]. - In a two-layer network architecture, the MPO value per GPU is approximately $128, while in a three-layer architecture, it can rise to $192 [24]. - As data center transmission rates increase, the demand for high-speed optical modules and corresponding MPO connectors is expected to grow, impacting overall costs [25].