Workflow
百万卡算力之路:多DC分布式训练和DCi需求增长
Guolian Securities·2024-10-07 04:03

Industry Investment Rating - The report maintains a "Stronger than the Market" rating for the industry [4] Core Viewpoints - High-energy-consuming computing clusters are driving AI model training from single data center (DC) to multi-DC collaborative training, with long-distance asynchronous collaborative training becoming mainstream [2] - Meta and Google have already started multi-DC distributed training, with Google's Gemini 1 Ultra being a notable example [2] - OpenAI and Microsoft plan to interconnect their large-scale campuses nationwide for extensive distributed training [2] - Multi-DC collaborative training poses challenges to network infrastructure, particularly in terms of packet loss sensitivity and load balancing [6] - 400G ZR coherent technology is expected to replace traditional WDM systems, with demand for ZR optical modules likely to grow [6] - The AI computing power demand is spreading to DCI scenarios, potentially driving rapid growth in the DCI market [6] Summary by Relevant Sections Multi-DC Collaborative Training - Meta and Google are actively deploying multi-DC distributed training, with Google's Gemini 1 Ultra being a key example [6] - Google has two major multi-DC regions in Ohio and Iowa/Nebraska, with plans to expand capacity significantly [9] - OpenAI and Microsoft are planning nationwide distributed training by interconnecting their large-scale campuses [10] Challenges in Distributed Training - AI training is entering the era of 100,000-card clusters, posing challenges for cross-DC collaborative training [11] - Key challenges include high sensitivity to packet loss, load balancing issues due to elephant flows, and extreme traffic bursts reaching thousands of Tbps [11] - Current 10km cross-building parallel training can achieve less than 5% efficiency loss, but future 100km and 1000km training will require advanced DCI networks and other technologies to keep losses below 10% [11] DCI Interconnection Solutions and Market Analysis - 400G ZR coherent optical technology is expected to replace traditional WDM systems in DCI, offering a more streamlined solution [14] - LightCounting predicts growth in 400G ZR and ZR+ optical modules from 2024 to 2028, with 400G ZR priced at $3,230 in 2023 and 800G ZR at $4,800 in 2024 [17] - DCI scenarios will choose different products based on communication distance, with DWDM+ZR modules preferred for cross-campus connections [15] Investment Recommendations - The report recommends focusing on the DCI industry chain and 400G/800G ZR suppliers, including domestic OTN manufacturers like ZTE, FiberHome, and Accelink [21] - Companies with 400G/800G ZR product layouts, such as Eoptolink, InnoLight, and HG Tech, are also highlighted for attention [21] - The report suggests prioritizing overseas DCI markets in the short term, with a long-term focus on domestic DCI development [18]