RDMA
Search documents
部分指标赶超英伟达!国产首款400G原生RDMA问世
Shang Hai Zheng Quan Bao· 2026-03-12 14:24
Group 1 - The core viewpoint of the article highlights the breakthrough in domestic RDMA technology with the launch of the scaleFabric, a native lossless RDMA high-speed network by Zhongke Shuguang, which competes with Nvidia's NDR technology [2][4] - The scaleFabric400 series network products have been validated in a nearly 10,000-card scale environment and have been running stably for over 10 months, filling a technological gap in domestic high-speed interconnects for clusters [2][8] - The performance specifications of the scaleFabric400 include a port bandwidth of 400Gbps, end-to-end communication latency as low as 0.9 microseconds, and a switch with a single port bandwidth of 800Gbps, supporting a total switching capacity of 64Tbps [6][8] Group 2 - The product features a credit-based lossless flow control mechanism that mitigates congestion and packet loss risks, with a link failure recovery time of less than 1 millisecond, supporting nearly 10,000-card clusters [8][11] - Compared to Nvidia's NDR, the scaleFabric400 offers a 25% increase in switch port density, a 100% increase in maximum QP number supported by network cards, and a maximum interconnect scale that is 2.33 times that of traditional IB [8][10] - The deployment of the scaleFabric network is already operational in Zhengzhou, supporting a national-level AI computing network base with a total scale of 30,000 cards [9][10] Group 3 - Zhongke Shuguang has developed a complete computing power foundation through long-term technological accumulation in high-performance computing, storage, and networking, enabling a collaborative development of "computing-storage-network" [13] - The successful implementation of the native RDMA network signifies the formation of an independent technological path in intelligent computing interconnects in China, addressing a critical component of the country's computing infrastructure [13] - The high-performance network industry ecosystem surrounding the native RDMA technology is accelerating its formation as the product is applied in ultra-large-scale intelligent computing clusters [13]
以太网 vs Infiniband的AI网络之争
傅里叶的猫· 2025-08-13 12:46
Core Viewpoint - The article discusses the competition between InfiniBand and Ethernet in AI networking, highlighting the advantages of Ethernet in terms of cost, scalability, and compatibility with existing infrastructure [6][8][22]. Group 1: AI Networking Overview - AI networks are primarily based on InfiniBand due to NVIDIA's dominance in the AI server market, but Ethernet is gaining traction due to its cost-effectiveness and established deployment in large-scale data centers [8][20]. - The establishment of the "Ultra Ethernet Consortium" (UEC) aims to enhance Ethernet's capabilities for high-performance computing and AI, directly competing with InfiniBand [8][9]. Group 2: Deployment Considerations - Teams face four key questions when deploying AI networks: whether to use existing TCP/IP networks or build dedicated high-performance networks, whether to choose InfiniBand or Ethernet-based RoCE, how to manage and maintain the network, and whether it can support multi-tenant isolation [9][10]. - The increasing size of AI models, often reaching hundreds of billions of parameters, necessitates distributed training, which relies heavily on network performance for communication efficiency [10][20]. Group 3: Technical Comparison - InfiniBand offers advantages in bandwidth and latency, with capabilities such as high-speed data transfer and low end-to-end communication delays, making it suitable for high-performance computing [20][21]. - Ethernet, particularly RoCE v2, provides flexibility and cost advantages, allowing for the integration of traditional Ethernet services while supporting high-performance RDMA [18][22]. Group 4: Future Trends - In AI inference scenarios, Ethernet is expected to demonstrate greater applicability and advantages due to its compatibility with existing infrastructure and cost-effectiveness, leading to more high-performance clusters being deployed on Ethernet [22][23].