Workflow
傅里叶的猫
icon
Search documents
外资顶尖投行研报分享
傅里叶的猫· 2025-05-23 15:46
还有专注于半导体行业分析的SemiAnalysis的全部分析报告: 想要看外资研报的同学,给大家推荐一个星球,在星球中每天都会上传几百篇外资顶尖投行的原文研 报:大摩、小摩、UBS、高盛、Jefferies、HSBC、花旗、BARCLAYS 等。 星球中每日还会更新Seeking Alpha、Substack、 stratechery的精选付费文章, 现在星球中领券后只需要 390元,即可每天都能看到上百篇外资顶尖投行科技行业的分析报告和每天的精选报告,无论是我们自 己做投资,还是对行业有更深入的研究,都是非常值得的。 ...
SemiAnalysis--为什么除了CSP,几乎没人用AMD的GPU?
傅里叶的猫· 2025-05-23 15:46
Core Viewpoint - The article provides a comprehensive analysis comparing the inference performance, total cost of ownership (TCO), and market dynamics of NVIDIA and AMD GPUs, highlighting why AMD products are less utilized outside of large-scale cloud service providers [1][2]. Testing Background and Objectives - The research team conducted a six-month analysis to validate claims that AMD's AI servers outperform NVIDIA in TCO and inference performance, revealing complex results across different workloads [2][5]. Performance Comparison - For customers using vLLM/SGLang, the performance cost ratio (perf/$) of single-node H200 deployments is sometimes superior, while MI325X can outperform depending on workload and latency requirements [5]. - In most scenarios, MI300X lacks competitiveness against H200, but it outperforms H100 for specific models like Llama3 405B and DeepSeekv3 670B [5]. - For short-term GPU rentals, NVIDIA consistently offers better cost performance due to a larger number of rental providers, while AMD's offerings are limited, leading to higher prices [5][26]. Total Cost of Ownership (TCO) Analysis - AMD's MI300X and MI325X GPUs generally have lower hourly costs compared to NVIDIA's H100 and H200, with MI300X costing $1.34 per hour and MI325X costing $1.53 per hour [21]. - The capital cost constitutes a significant portion of the total cost, with MI300X having a capital cost share of 70.5% [21]. Market Dynamics - AMD's market share in the AI GPU sector has been growing steadily, but it is expected to decline in early 2025 due to NVIDIA's Blackwell series launch, while AMD's response products will not be available until later [7]. - The rental market for AMD GPUs is constrained, with few providers, leading to artificially high prices and reduced competitiveness compared to NVIDIA [26][30]. Benchmark Testing Methodology - The benchmark testing focused on real-world inference workloads, measuring throughput and latency under various user loads, which differs from traditional offline benchmarks [10][11]. - The testing included a variety of input/output token lengths to assess performance across different inference scenarios [11][12]. Benchmark Results - In tests with Llama3 70B FP16, MI325X and MI300X outperformed all other GPUs in low-latency scenarios, while H200 showed superior performance in high-concurrency situations [15][16]. - For Llama3 405B FP8, MI325X consistently demonstrated better performance than H100 and H200 in various latency conditions, particularly in high-latency scenarios [17][24]. Conclusion on AMD's Market Position - The article concludes that AMD needs to lower rental prices to compete effectively with NVIDIA in the GPU rental market, as current pricing structures hinder its competitiveness [26][30].
外资顶尖投行研报分享
傅里叶的猫· 2025-05-22 13:44
Group 1 - The article recommends a platform where users can access hundreds of top-tier foreign investment bank research reports daily, including those from firms like Morgan Stanley, UBS, Goldman Sachs, Jefferies, HSBC, Citigroup, and Barclays [1] - The platform also provides comprehensive analysis reports focused on the semiconductor industry from SemiAnalysis, along with selected paid articles from Seeking Alpha, Substack, and stratechery [3] - The subscription to the platform is currently available for 390 yuan, allowing access to a wealth of technology industry analysis reports and selected articles daily, which is deemed valuable for both personal investment and deeper industry research [3]
JP Morgan--AI服务器市场分析
傅里叶的猫· 2025-05-22 13:44
这篇文章,我们来看下JP Morgan最近出的一篇AI Server的分析报告。这个报告的内容真的是干货满 满,20页的pdf中,JP Morgan对2025年英伟达各个GPU的出货量做了预测,给出了微软、Meta、亚马 逊、谷歌的NVL72等效机架需求,并写明了CSP资本支出与AI服务器出货量的三角验证。还有 GB200/300在ODM之间的分配比例和ODM的库存情况,华为910B等效芯片产量预测... 这个报告给出的数据实在是太多了,非常值得一看。但还是那句,这些预测数据都是JP Morgan的一 家之言,大家要自行判断数据真假。想看原文的同学请到星球中取。 正文 DeepSeek后需求信号令人鼓舞,但上游与ODM之间仍存差距 我们维持今年Nvidia高端GPU预测不变,为550万,但调整了产品组合预测,以反映GB服务器的上升 趋势。我们认为,尽管近期供应链出现小问题,Nvidia仍专注于基于ARM的AI服务器(即GB/VR) 而非HGX产品。我们现在预测今年Blackwell GPU中GB服务器占比约85%(约380万)。对于HGX, 我们将Blackwell HGX GPU预测下调至约90万,但Hop ...
特供中国的阉割版Blackwell-B40的几点信息
傅里叶的猫· 2025-05-21 12:12
美国或设 GPU 内存带宽上限 1.7-1.8TB/s,若如此英伟达 H20 可能弃用 HBM 内存改用 GDDR6,降 级版 H20 性能仍可能强于使用 GDDR6 的游戏 GPU,如RTX5090D。 Jefferies 报告:阉割版H20 可能弃用 HBM,内存改用 GDDR6 自从前段时间H20被禁之后,英伟达就开始计划再做一个阉割版的H20来特供中国,但严格来讲不能 叫阉割版H20,H20是Hopper架构,而新的阉割版GPU是Blackwell架构。 我们之前写过一篇文章,当时看到Jefferies的一篇分析,其核心观点是: GPU+FPGA 今天又看到另一篇研报,是广发海外的一个分析,认为该 GPU可能命名为 6000D,即 B40,极有可 能在 7 月初发布,其搭载的 GDDR7 速率约为 1.7TB/s(对比 H20 的 4TB/s)。在 NVLink 单向速度 约 550GB/s 及 CUDA 的支持下,广发预计到 2025 年底出货量将达约 100 万片。 对于很多没有公布的数据,我们都要结合多家的报告来看,因为每家的分析可能多多少少有些不同 的地方。就像我们上篇文章中提到的华为昇腾系列 ...
外资顶尖投行研报分享
傅里叶的猫· 2025-05-21 12:12
星球中每日还会更新Seeking Alpha、Substack、 stratechery的精选付费文章, 现在星球中领券后只需要 390元,即可每天都能看到上百篇外资顶尖投行科技行业的分析报告和每天的精选报告,无论是我们自 己做投资,还是对行业有更深入的研究,都是非常值得的。 还有专注于半导体行业分析的SemiAnalysis的全部分析报告: 想要看外资研报的同学,给大家推荐一个星球,在星球中每天都会上传几百篇外资顶尖投行的原文研 报:大摩、小摩、UBS、高盛、Jefferies、HSBC、花旗、BARCLAYS 等。 ...
外资顶尖投行研报分享
傅里叶的猫· 2025-05-20 13:00
Group 1 - The article recommends a platform where users can access hundreds of top-tier foreign investment bank research reports daily, including those from firms like Morgan Stanley, UBS, Goldman Sachs, Jefferies, HSBC, Citigroup, and Barclays [1] - The platform also provides comprehensive analysis reports focused on the semiconductor industry from SemiAnalysis, along with daily updates of selected paid articles from Seeking Alpha, Substack, and Stratechery [3] - The subscription to the platform is currently available for 390 yuan, offering access to numerous technology industry analysis reports and selected articles daily, which is deemed valuable for both personal investment and deeper industry research [3]
华为昇腾910系列2025年出货量调研
傅里叶的猫· 2025-05-20 13:00
Core Viewpoint - The report from Mizuho Securities provides an analysis of companies including Broadcom, NVIDIA, AMD, Supermicro, and Huawei, highlighting the expected growth and challenges in the AI ASIC and GPU markets. Group 1: Broadcom and NVIDIA - Mizuho expects Broadcom's custom ASIC chips (TPUv7p/MTIA2) to accelerate in deployment by 2026, potentially being used in OpenAI's Strawberry and Apple's Baltra projects in the second half of 2026 [1] - In 2024, Broadcom's custom ASIC chips are projected to account for 70%-80% of usage, establishing it as a leader in AI ASICs, excluding self-manufactured AI ASICs like Google's TPU [1] - The UMAIN project in Saudi Arabia plans to deploy 4,000 GB200 NVL72 servers, corresponding to 280,000 NVIDIA GPUs and 350,000 AMD GPUs over the next five years [1] - The G42 project in the UAE has committed to importing 500,000 NVIDIA GB200 GPUs annually, valued at $15 billion, although the sustainability of this figure is questioned [1] Group 2: Huawei - The report anticipates that Huawei's Ascend 910 orders will exceed 700,000 units by 2025, with the next-generation Ascend 920 expected to launch in 2026 [2] - However, the current yield rate for the Ascend 910 is low at only 30%, a figure corroborated by previous reports [2][3] - Other estimates suggest that the shipment volume for the Ascend 910 series could be over 700,000 units this year [5]
外资顶尖投行研报分享
傅里叶的猫· 2025-05-19 15:11
还有专注于半导体行业分析的SemiAnalysis的全部分析报告: 想要看外资研报的同学,给大家推荐一个星球,在星球中每天都会上传几百篇外资顶尖投行的原文研 报:大摩、小摩、UBS、高盛、Jefferies、HSBC、花旗、BARCLAYS 等。 星球中每日还会更新Seeking Alpha、Substack、 stratechery的精选付费文章, 现在星球中领券后只需要 390元,即可每天都能看到上百篇外资顶尖投行科技行业的分析报告和每天的精选报告,无论是我们自 己做投资,还是对行业有更深入的研究,都是非常值得的。 ...
英伟达computeX 大会--NVLink Fusion
傅里叶的猫· 2025-05-19 15:11
Core Viewpoint - Nvidia's introduction of NVLink Fusion aims to enhance flexibility and customization in AI infrastructure while maintaining its technological advantage in the market [8][17]. Group 1: Nvidia's Development and Products - Nvidia has evolved from focusing on GPUs to becoming a giant in AI infrastructure, with significant milestones such as the launch of CUDA in 2006 [1]. - The GB300 chip, set to launch in Q3, boasts a 1.5x improvement in inference performance, HBM memory, and a 2x increase in network bandwidth, while maintaining physical compatibility with previous generations [6]. - The Project DIGITS personal AI computer, DGX Spark, is now in full production, with availability expected by Christmas [6]. Group 2: NVLink Fusion Technology - NVLink Fusion extends Nvidia's NVLink technology to third-party CPUs and accelerators, allowing for a more open ecosystem while still requiring Nvidia chips in the system [8][10]. - The technology includes two components: a semi-custom CPU connection via NVLink C2C and the integration of NVLink 5 Chiplet into third-party accelerators [9][10]. - NVLink Fusion is designed as a "either/or" technology, allowing for either a semi-custom CPU or GPU but not both simultaneously, ensuring Nvidia's presence in the system [10]. Group 3: Market Implications and Partnerships - Current partners for NVLink Fusion include Alchip and AsteraLabs, with Fujitsu and Qualcomm developing new CPUs compatible with Nvidia GPUs [11]. - The limited openness of NVLink Fusion may accelerate diversification in AI computing infrastructure and provide pathways for third-party chips to enter the high-performance computing market [11][17]. - Nvidia's strategy reflects an understanding that a fully closed NVLink could limit market expansion, particularly among cloud service providers and sovereign AI projects [17]. Group 4: NVLink Advantages - NVLink 5 offers a dual bandwidth of 1.8 TB/s, significantly outperforming PCIe 5.0, which is crucial for scaling AI model training and inference [20]. - The NVLink Switch chip enables rack-level scalability, supporting up to 72 GPUs with a total bandwidth of 130 TB/s, a capability that competitors struggle to match [20]. - The integration of NVLink with Nvidia's SHARP protocol and Mission Control software optimizes AI workload throughput and latency, enhancing overall performance [20].