Workflow
英伟达迎来一群劲敌
半导体行业观察·2025-09-01 01:17

Core Viewpoint - The article discusses the transformative Ultra Ethernet (UE) 1.0 standard, which defines a high-performance Ethernet protocol for artificial intelligence (AI) and high-performance computing (HPC) systems, emphasizing its innovative Ultra Ethernet Transport (UET) layer designed for reliable, high-speed communication in large-scale systems [2][4]. Group 1: Overview of Ultra Ethernet - Ultra Ethernet (UE) aims to standardize high-performance networking for AI and HPC, addressing limitations of existing protocols like InfiniBand and RoCE [4][8]. - The development of UE involved collaboration among major tech companies, leading to the formation of the Ultra Ethernet Consortium (UEC) in July 2023, with over 100 member companies by the end of 2024 [9][10]. - UE is designed to be compatible with existing Ethernet infrastructure, allowing for easy deployment and scalability in data centers [10][11]. Group 2: Technical Innovations - The UET layer allows for hardware-accelerated communication, significantly improving computational efficiency by a factor of 1000 for every bit of data transmitted [2][7]. - UE introduces a connectionless API and supports various topologies, including traditional fat tree and optimized structures, to meet the scalability needs of future AI systems [10][12]. - The protocol supports multiple delivery modes, including reliable unordered delivery and reliable ordered delivery, catering to different application requirements [49][50]. Group 3: Addressing Limitations of Existing Protocols - Previous protocols like RoCE faced challenges such as head-of-line blocking and congestion issues, which UE aims to resolve through innovative congestion management and packet delivery mechanisms [6][10]. - UE's design allows for packet spraying, which distributes packets across multiple paths to avoid traffic polarization and improve bandwidth utilization [22][21]. - The UET layer is built to operate seamlessly over existing Ethernet networks, ensuring compatibility while enhancing performance [14][27]. Group 4: Application and Use Cases - UE is applicable in various network types, including local networks connecting CPUs to accelerators, backend networks for high-performance connections, and frontend networks for traditional data center operations [12][13]. - The standard provides three configuration profiles (HPC, AI Full, and AI Base) to support different functionalities and complexities in implementation [24][25]. - The architecture of UE is designed to facilitate efficient communication in large-scale systems, making it suitable for modern AI workloads and HPC applications [28][29].