Workflow
从英伟达的视角看算力互连板块成长性——Scale Up网络的“Scaling Law”存在吗? | 投研报告

Core Insights - The report emphasizes the necessity of Scale Up networks due to the "memory wall" issue and the evolution of AI computing paradigms, which necessitate the pooling of memory through Scale Up solutions [1][3] Group 1: Scale Up Network Expansion - Nvidia is continuously expanding the Scale Up network through two main paths: enhancing single-card bandwidth with NVLink 5.0 achieving 7200 Gb/s and increasing supernode sizes from H100NVL8 to GH200 and GB200 [2] - The Scale Up network is expected to follow a Scaling Law, with the second layer of Scale Up networks emerging, requiring a specific ratio of optical and AEC connections to chips [2][4] Group 2: Addressing the Memory Wall - The "memory wall" problem is characterized by the growing gap between the parameter size of large models and single-card memory, as well as the disparity between single-card computing power and memory [3] - To enhance computational efficiency, various parallel computing methods are employed, including data parallelism, pipeline parallelism, tensor parallelism, and expert parallelism, which significantly increase communication frequency and capacity requirements [3] Group 3: Need for Larger Scale Up Networks - The demand for larger Scale Up networks is driven by Total Cost of Ownership (TCO), user experience, and model capability expansion, as the Tokens Per Second (TPS) consumed by single users is expected to rise [3] - The report suggests that the Scale Up size is non-linearly related to expected single-user TPS and actual single-card performance, indicating a need for larger Scale Up networks to maintain performance [3] Group 4: Building Larger Scale Up Networks - To construct larger Scale Up networks, a second layer of Scale Up switches is needed between cabinets, with optical and AEC connections expected to coexist in the new network structure [4] - The report highlights that each GPU requires nine additional equivalent 1.6T connections, which is 3-4.5 times that of Scale Out networks, and every four GPUs necessitate an additional switch, which is 7.5-12 times that of Scale Out networks [4] Group 5: Investment Opportunities - The ongoing demand for Scale Up networks is anticipated to drive exponential growth in network connection requirements, benefiting sectors such as optical interconnects and switches [4] - Relevant companies in the optical interconnect space include Zhongji Xuchuang, Xinyi Sheng, and Tianfu Tong, while switch manufacturers include Ruijie Networks and Broadcom [5]