Workflow
华为新技术,挑战英伟达
半导体芯闻·2025-08-28 09:55

Core Viewpoint - Huawei has introduced the UB-Mesh technology at the Hot Chips 2025 conference, aiming to unify all interconnections within AI data centers using a single protocol, which will be open-sourced next month [2][25]. Summary by Sections UB-Mesh Technology - UB-Mesh is designed to replace multiple existing protocols (PCIe, CXL, NVLink, TCP/IP) to reduce latency, control costs, and enhance reliability in gigawatt-level data centers [2][5]. - The technology allows any port to communicate with others without conversion, simplifying design and reducing conversion delays [5]. SuperNode Architecture - Huawei defines SuperNode as an AI architecture for data centers that can integrate up to 1,000,000 processors, with bandwidth per chip increased from 100 Gbps to 10 Tbps (1.25 TB/s) [7]. - The architecture aims to lower latency and allows flexible reuse of high-speed SERDES connections, supporting backward compatibility through Ethernet [7]. Challenges and Solutions - Transitioning from copper cables to pluggable optical links poses challenges, particularly regarding error rates [13]. - Huawei proposes link-level retry mechanisms and cross-design connections to ensure continuous operation even if individual links or modules fail [13]. Network Topology and Reliability - The UB-Mesh network topology is hybrid, using a CLOS structure to connect racks and a multi-dimensional grid for nodes within each rack, aiming to reduce costs as the system scales [17]. - A system model is outlined where a hot standby rack takes over if another fails, significantly extending the mean time between failures [22]. Cost Efficiency - Traditional interconnect costs increase linearly with the number of nodes, potentially exceeding the price of AI accelerators, while UB-Mesh's costs increase sub-linearly, making it more scalable [22]. - Huawei has proposed a practical system with 8192 nodes to demonstrate feasibility [22]. Market Implications - With UB-Mesh and SuperNode, Huawei aims to support large-scale AI clusters and reduce reliance on Western standards like PCIe and NVLink [25]. - The adoption of UB-Mesh by other companies remains uncertain, as industry interest in a single vendor's data center infrastructure is still to be evaluated [26].