Workflow
阿里的磐久超节点和供应链
傅里叶的猫·2025-09-27 10:14

Core Viewpoint - The article provides a detailed comparison of Alibaba's super node with NVIDIA's NVL72 and Huawei's CM384, focusing on GPU count, interconnect technology, power consumption, and ecosystem compatibility. Group 1: GPU Count - Alibaba's super node, known as "Panjun," utilizes a configuration of 128 GPUs, with each of the 16 computing nodes containing 4 self-developed GPUs, totaling 16 x 4 x 2 = 128 GPUs [4] - In contrast, Huawei's CM384 includes 384 Ascend 910C chips, while NVIDIA's NVL72 consists of 72 GPUs [7] Group 2: Interconnect Technology - NVIDIA's NVL72 employs a cable tray interconnect method using NVLink proprietary protocol [8] - Huawei's CM384 also uses cable connections between multiple racks [10] - Alibaba's super node features an orthogonal interconnect without a backplane, allowing for direct connections between computing and switch nodes, reducing signal transmission loss [12][14] Group 3: Power and Optical Connections - NVIDIA's NVL72 uses copper for scale-up connections, while Huawei's CM384 employs optical interconnects, leading to higher costs and power consumption [15] - Alibaba's super node uses electrical interconnects for internal scale-up, with some connections made via PCB and copper cables, while optical interconnects are used between two ALink switches [18][19] Group 4: Parameter Comparison - Key performance metrics show that NVIDIA's GB200 NVL72 has a BF16 dense TFLOPS of 2,500, while Huawei's CM384 has 780, indicating a significant performance gap [21] - The HBM capacity for NVIDIA's GB200 is 192 GB compared to Huawei's 128 GB, and the scale-up bandwidth for NVIDIA is 7,200 Gb/s while Huawei's is 2,800 Gb/s [21] Group 5: Ecosystem Compatibility - Alibaba claims compatibility with multiple GPU/ASICs, provided they support the ALink protocol, which may pose challenges as major manufacturers are reluctant to adopt proprietary protocols [23] - Alibaba's GPUs are compatible with CUDA, providing a competitive advantage in the current market [24] Group 6: Supply Chain Insights - In the AI and general server integration market, Inspur holds a 33%-35% market share, while Huawei's share is 23% [33] - For liquid cooling, Haikang and Invec are key players, each holding 30%-40% of the market [35] - In the PCB sector, the number of layers has increased to 24-30, with low-loss materials making up over 60% of the composition, significantly increasing the value of single-card PCBs [36]