Workflow
Scale Up网络
icon
Search documents
从英伟达的视角看算力互连板块成长性——Scale Up网络的“Scaling Law”存在吗? | 投研报告
Core Insights - The report emphasizes the necessity of Scale Up networks due to the "memory wall" issue and the evolution of AI computing paradigms, which necessitate the pooling of memory through Scale Up solutions [1][3] Group 1: Scale Up Network Expansion - Nvidia is continuously expanding the Scale Up network through two main paths: enhancing single-card bandwidth with NVLink 5.0 achieving 7200 Gb/s and increasing supernode sizes from H100NVL8 to GH200 and GB200 [2] - The Scale Up network is expected to follow a Scaling Law, with the second layer of Scale Up networks emerging, requiring a specific ratio of optical and AEC connections to chips [2][4] Group 2: Addressing the Memory Wall - The "memory wall" problem is characterized by the growing gap between the parameter size of large models and single-card memory, as well as the disparity between single-card computing power and memory [3] - To enhance computational efficiency, various parallel computing methods are employed, including data parallelism, pipeline parallelism, tensor parallelism, and expert parallelism, which significantly increase communication frequency and capacity requirements [3] Group 3: Need for Larger Scale Up Networks - The demand for larger Scale Up networks is driven by Total Cost of Ownership (TCO), user experience, and model capability expansion, as the Tokens Per Second (TPS) consumed by single users is expected to rise [3] - The report suggests that the Scale Up size is non-linearly related to expected single-user TPS and actual single-card performance, indicating a need for larger Scale Up networks to maintain performance [3] Group 4: Building Larger Scale Up Networks - To construct larger Scale Up networks, a second layer of Scale Up switches is needed between cabinets, with optical and AEC connections expected to coexist in the new network structure [4] - The report highlights that each GPU requires nine additional equivalent 1.6T connections, which is 3-4.5 times that of Scale Out networks, and every four GPUs necessitate an additional switch, which is 7.5-12 times that of Scale Out networks [4] Group 5: Investment Opportunities - The ongoing demand for Scale Up networks is anticipated to drive exponential growth in network connection requirements, benefiting sectors such as optical interconnects and switches [4] - Relevant companies in the optical interconnect space include Zhongji Xuchuang, Xinyi Sheng, and Tianfu Tong, while switch manufacturers include Ruijie Networks and Broadcom [5]
AI算力跟踪深度(三):从英伟达的视角看算力互连板块成长性:ScaleUp网络的“ScalingLaw”存在吗?
Soochow Securities· 2025-08-20 05:35
Investment Rating - The industry investment rating is "Overweight," indicating an expected outperformance of the industry index relative to the benchmark by more than 5% over the next six months [110]. Core Insights - The report suggests that there is a "Scaling Law" for Scale Up networks, which will lead to increased demand for network connections, particularly in light of the growing requirements for AI computing [3][6]. - The need for Scale Up networks is driven by the "memory wall" problem and the evolution of AI computing paradigms, necessitating the pooling of memory resources [4][32]. - The report emphasizes that the demand for larger Scale Up networks is linked to Total Cost of Ownership (TCO), user experience, and the expansion of model capabilities [6][52]. Summary by Sections 1. Expansion of Scale Up Networks - NVIDIA is continuously expanding the Scale Up network through two main paths: enhancing single-card bandwidth and increasing supernode scale [3][19]. - The latest NVLink 5.0 supports a single-card bandwidth of 7200 Gb/s, doubling the bandwidth from the previous generation [16][19]. - The Scale Up supernode scale has evolved from H100 NVL8 to GH200 and GB200, with NVL72 being a key configuration for improving training and inference efficiency [19][22]. 2. Necessity of Scale Up Networks - The "memory wall" issue, where the gap between model parameters and single-card memory capacity is widening, necessitates the pooling of memory through Scale Up networks [35]. - AI training and inference require various parallel computing methods, with tensor parallelism being highlighted for its efficiency in optimizing computation [39][43]. 3. Demand for Larger Scale Up Networks - As user Token Per Second (TPS) consumption increases, the performance of existing servers will decline, necessitating larger Scale Up networks to enhance effective performance [6][52]. - The report indicates a non-linear relationship between Scale Up size and actual performance, suggesting that larger networks will yield greater performance benefits [6][57]. 4. Building Larger Scale Up Networks - The report outlines the need for a second layer of Scale Up switches between cabinets to accommodate the growing demand for network connections [80][85]. - It highlights that each GPU in the second layer requires nine additional equivalent 1.6T connections, significantly increasing the network's complexity compared to Scale Out networks [93]. 5. Investment Recommendations - The report identifies potential beneficiaries of the expanding Scale Up demand, including companies involved in optical interconnects and switches, such as Zhongji Xuchuang and Astera Labs [105].