Workflow
SuperNode
icon
Search documents
华为新技术,挑战英伟达
半导体芯闻· 2025-08-28 09:55
Core Viewpoint - Huawei has introduced the UB-Mesh technology at the Hot Chips 2025 conference, aiming to unify all interconnections within AI data centers using a single protocol, which will be open-sourced next month [2][25]. Summary by Sections UB-Mesh Technology - UB-Mesh is designed to replace multiple existing protocols (PCIe, CXL, NVLink, TCP/IP) to reduce latency, control costs, and enhance reliability in gigawatt-level data centers [2][5]. - The technology allows any port to communicate with others without conversion, simplifying design and reducing conversion delays [5]. SuperNode Architecture - Huawei defines SuperNode as an AI architecture for data centers that can integrate up to 1,000,000 processors, with bandwidth per chip increased from 100 Gbps to 10 Tbps (1.25 TB/s) [7]. - The architecture aims to lower latency and allows flexible reuse of high-speed SERDES connections, supporting backward compatibility through Ethernet [7]. Challenges and Solutions - Transitioning from copper cables to pluggable optical links poses challenges, particularly regarding error rates [13]. - Huawei proposes link-level retry mechanisms and cross-design connections to ensure continuous operation even if individual links or modules fail [13]. Network Topology and Reliability - The UB-Mesh network topology is hybrid, using a CLOS structure to connect racks and a multi-dimensional grid for nodes within each rack, aiming to reduce costs as the system scales [17]. - A system model is outlined where a hot standby rack takes over if another fails, significantly extending the mean time between failures [22]. Cost Efficiency - Traditional interconnect costs increase linearly with the number of nodes, potentially exceeding the price of AI accelerators, while UB-Mesh's costs increase sub-linearly, making it more scalable [22]. - Huawei has proposed a practical system with 8192 nodes to demonstrate feasibility [22]. Market Implications - With UB-Mesh and SuperNode, Huawei aims to support large-scale AI clusters and reduce reliance on Western standards like PCIe and NVLink [25]. - The adoption of UB-Mesh by other companies remains uncertain, as industry interest in a single vendor's data center infrastructure is still to be evaluated [26].
挑战Nvlink,华为推出互联技术,即将开源
半导体行业观察· 2025-08-28 01:14
公众号记得加星标⭐️,第一时间看推送不会错过。 利用其Hot Chips 2025大会的演讲契机,华为推出了UB-Mesh技术,该技术旨在通过单一协议统一 AI数据中心内外部节点的所有互连。该公司还表示,将在下个月的活动中宣布向所有用户免费开放该 协议。 该技术旨在用单一协议取代PCIe、CXL、NVLink和TCP/IP协议,以降低延迟、控制成本并提高千兆 级数据中心的可靠性。为了推动这一举措,华为计划开源该规范。但它会获得广泛关注吗? 华为处理器部门海思半导体首席科学家廖恒(音译)表示:"下个月我们将召开一次会议,宣布UB- Mesh协议将像免费许可证一样向所有人开放。" "这是一项非常新的技术;我们看到不同阵营正在竞 相推进标准化工作。根据我们在实际系统部署方面的成功程度以及合作伙伴和客户的需求,我们可以 讨论将其转化为某种标准。" 虽然用于训练和推理的 AI 数据中心应该像一个大型并行处理器一样运行,但它们由独立的机架、服 务器、CPU、GPU、内存、SSD、NIC、交换机和其他组件组成,这些组件使用不同的总线和协议相 互连接,例如 UPI、PCIe、CXL、RoCE、NVLink、UALink、TC ...