Core Insights - The development of AI large models requires significant resources, including a large number of technical experts and substantial financial investment, with a critical need for powerful computing capabilities [1] - The demand for computing power is expected to grow exponentially across various industries, with IDC predicting that China's intelligent computing power demand will reach 2781 EFLOPS by 2028, reflecting an annual growth rate of 46.2% [1] - Traditional computing clusters face bottlenecks when scaling beyond thousands of cards, necessitating innovative solutions like the "ten-thousand card super cluster" [2] Group 1: ScaleX Ten-Thousand Card Super Cluster - The ScaleX ten-thousand card super cluster system was unveiled by Sugon at the HAIC2025 conference, designed to meet the extreme demands of AI infrastructure [3] - This system features 16 super nodes connected by a proprietary high-speed network, capable of supporting 10,240 AI accelerator cards, marking a significant advancement in domestic large-scale computing cluster technology [5] - The ScaleX system achieves a total computing power exceeding 5 EFLOPS, with a power usage effectiveness (PUE) value as low as 1.04, enhancing computing density by 20 times [5][9] Group 2: Technical Advantages - The ScaleX system utilizes a self-developed RDMA high-speed network, achieving 400 Gb/s bandwidth and under 1 microsecond communication latency, significantly improving communication performance [9] - The system incorporates deep optimization for storage, computing, and transmission, enhancing resource utilization by 55% during large model training [9] - It features a digital twin for intelligent scheduling and management, ensuring 99.99% availability and supporting the management of tens of thousands of nodes [9] Group 3: Open Architecture and Ecosystem Development - The ScaleX super cluster supports multiple brands of accelerator cards and mainstream computing ecosystems, promoting an open architecture for AI computing [10] - This initiative aims to lower the barriers for AI companies to develop intelligent computing clusters and foster a collaborative industrial ecosystem [10][12] - The open model allows users greater choice and compatibility with mainstream AI development frameworks, facilitating broader participation in the ecosystem [12][13]
算力内卷时代,“开放架构”万卡超集群为何成刚需?