Workflow
AI算力跟踪深度(三):从英伟达的视角看算力互连板块成长性:ScaleUp网络的“ScalingLaw”存在吗?

Investment Rating - The industry investment rating is "Overweight," indicating an expected outperformance of the industry index relative to the benchmark by more than 5% over the next six months [110]. Core Insights - The report suggests that there is a "Scaling Law" for Scale Up networks, which will lead to increased demand for network connections, particularly in light of the growing requirements for AI computing [3][6]. - The need for Scale Up networks is driven by the "memory wall" problem and the evolution of AI computing paradigms, necessitating the pooling of memory resources [4][32]. - The report emphasizes that the demand for larger Scale Up networks is linked to Total Cost of Ownership (TCO), user experience, and the expansion of model capabilities [6][52]. Summary by Sections 1. Expansion of Scale Up Networks - NVIDIA is continuously expanding the Scale Up network through two main paths: enhancing single-card bandwidth and increasing supernode scale [3][19]. - The latest NVLink 5.0 supports a single-card bandwidth of 7200 Gb/s, doubling the bandwidth from the previous generation [16][19]. - The Scale Up supernode scale has evolved from H100 NVL8 to GH200 and GB200, with NVL72 being a key configuration for improving training and inference efficiency [19][22]. 2. Necessity of Scale Up Networks - The "memory wall" issue, where the gap between model parameters and single-card memory capacity is widening, necessitates the pooling of memory through Scale Up networks [35]. - AI training and inference require various parallel computing methods, with tensor parallelism being highlighted for its efficiency in optimizing computation [39][43]. 3. Demand for Larger Scale Up Networks - As user Token Per Second (TPS) consumption increases, the performance of existing servers will decline, necessitating larger Scale Up networks to enhance effective performance [6][52]. - The report indicates a non-linear relationship between Scale Up size and actual performance, suggesting that larger networks will yield greater performance benefits [6][57]. 4. Building Larger Scale Up Networks - The report outlines the need for a second layer of Scale Up switches between cabinets to accommodate the growing demand for network connections [80][85]. - It highlights that each GPU in the second layer requires nine additional equivalent 1.6T connections, significantly increasing the network's complexity compared to Scale Out networks [93]. 5. Investment Recommendations - The report identifies potential beneficiaries of the expanding Scale Up demand, including companies involved in optical interconnects and switches, such as Zhongji Xuchuang and Astera Labs [105].