Workflow
GB200/300机架
icon
Search documents
2025年出货量下调至2.73万台
傅里叶的猫· 2025-12-14 12:37
以下文章来源于AI产业链研究 ,作者研究 AI产业链研究 . 围绕人工智能展开研究,涵盖基础设施、算法及应用等多个方面,同时也会分享研究过程中的一些心得 体会 大摩(Morgan Stanley)每月会提供各 ODM 厂商的月度及季度机架出货量预测,其于12月 8日发布最新报告。在主流 GPU AI 服务器 ODM 厂商中,大摩的优先排序为纬创 (Wistron)>鸿海(Hon Hai)>广达(Quanta)。 小摩(JP Morg an)也有类似的预测数 据,之前我们也写过 小摩2025年AI芯片市场预测:英伟达550万张,AMD 55万,ASIC 417万,国产芯片呢? 大摩对 GB200/300 机架的最新预测为 2.73 万个,较此前的 2.8 万个小幅下调。这一调整主要源于广达 2025 年第三季度财报电话会议后,大摩更新了其 2025 年第四季度机架预测。 广达管理层表示,2025 年第四季度 AI 业务将实现环比增长,但 AI 营收增速有望在 2026 年第一季度加 快,这一表述较之大摩此前的预期更为保守。因此,大摩将广达 2025 年第四季度机架产量从此前的约 3500 个下调至 2500 个 ...
GB200出货量上修,但NVL72目前尚未大规模训练
傅里叶的猫· 2025-08-20 11:32
Core Viewpoint - The article discusses the performance and cost comparison between NVIDIA's H100 and GB200 NVL72 GPUs, highlighting the potential advantages and challenges of the GB200 NVL72 in AI training environments [30][37]. Group 1: Market Predictions and Performance - After the ODM performance announcement, institutions raised the forecast for GB200/300 rack shipments in 2025 from 30,000 to 34,000, with expected shipments of 11,600 in Q3 and 15,700 in Q4 [3]. - Foxconn anticipates a 300% quarter-over-quarter increase in AI rack shipments, projecting a total of 19,500 units for the year, capturing approximately 57% of the market [3]. - By 2026, even with stable production of NVIDIA chips, downstream assemblers could potentially assemble over 60,000 racks due to an estimated 2 million Blackwell chips carried over [3]. Group 2: Cost Analysis - The total capital expenditure (Capex) for H100 servers is approximately $250,866, while for GB200 NVL72, it is around $3,916,824, making GB200 NVL72 about 1.6 to 1.7 times more expensive per GPU [12][13]. - The operational expenditure (Opex) for GB200 NVL72 is slightly higher than H100, primarily due to higher power consumption (1200W vs. 700W) [14][15]. - The total cost of ownership (TCO) for GB200 NVL72 is about 1.6 times that of H100, necessitating at least a 1.6 times performance advantage for GB200 NVL72 to be attractive for AI training [15][30]. Group 3: Reliability and Software Improvements - As of May 2025, GB200 NVL72 has not yet been widely adopted for large-scale training due to software maturity and reliability issues, with H100 and Google TPU remaining the mainstream options [11]. - The reliability of GB200 NVL72 is a significant concern, with early operators facing numerous XID 149 errors, which complicates diagnostics and maintenance [34][36]. - Software optimizations, particularly in the CUDA stack, are expected to enhance GB200 NVL72's performance significantly, but reliability remains a bottleneck [37]. Group 4: Future Outlook - By July 2025, GB200 NVL72's performance/TCO is projected to reach 1.5 times that of H100, with further improvements expected to make it a more favorable option [30][32]. - The GB200 NVL72's architecture allows for faster operations in certain scenarios, such as MoE (Mixture of Experts) models, which could enhance its competitive edge in the market [33].