AWS Trainium3
Search documents
32张图片图解SemiAnalysis的亚马逊AI芯片Trainium3的深度解读
傅里叶的猫· 2025-12-07 13:13
12/5/2025 AWS Trainium3 Deep Dive | A Potential Challenger Approaching NL72×2/NL32×2 Scale Up Rack Architecture, Step-Function Software & System ·公众号· More Than Semi Improvements, Optimized Perf per TCO, "Amazon ... AWS Trainium3: 优化性价比(Perf per TCO)与运营灵活性 核心理念:性价比与灵活性 运营玩 元 最大将性价比 (Perf per TCO) 多源组件供应商 定制芯片合作伙伴 供应链管理 最低 TCO 最快上市时间 硬件北极星指标 避免单一架构承诺,以实现最大适应性。 系统与网络:"Amazon Basics"方法 设计选择 "Amazon Basics" 方法 优化目标 → 带宽扩展交换机 > (12.8T / 25.6T / 51.2T) 为特定客户和数据中心 实现最佳 TCO 的手段, 提供最佳 TCO 而非固定标准。 冷却方式 (液冷 vs 风冷) 纵向扩 ...
TrendForce集邦咨询:Rubin平台无缆化架构与ASIC高HDI层架构 驱动PCB产业成为算力核心
Zhi Tong Cai Jing· 2025-11-20 09:12
Core Insights - The AI server design is undergoing a structural transformation, with the transition to cableless architecture and high-density interconnect (HDI) designs becoming central to the PCB industry's evolution [1][2] - The introduction of the Rubin platform marks a significant shift in PCB's role, emphasizing signal integrity and transmission stability as core design metrics [1][2] Group 1: PCB Design and Technology - The Rubin platform utilizes a cableless interconnect design, enhancing the PCB industry's status by shifting from traditional cable-based connections to multi-layer PCBs [1] - The new design materials include M8U grade for Switch Tray and M9 for Midplane, with PCB value per server increasing by over two times compared to previous generations [2] - The design logic of Rubin has become a common language in the industry, influencing other ASIC AI servers like Google TPU V7 and AWS Trainium3 [2] Group 2: Material Innovations - The demand for PCB performance in AI servers is driving significant changes in upstream materials, focusing on dielectric and thermal stability [2] - Nittobo is investing 15 billion yen to expand production of T-glass, which is expected to triple its capacity by the end of 2026, becoming a core material for ABF and BT substrates [2] - Low roughness HVLP4 copper foil is becoming mainstream due to the increasing impact of skin effect, leading to long-term supply tightness and a shift in bargaining power back to upstream material suppliers [3]
英伟达Rubin CPX 的产业链逻辑
傅里叶的猫· 2025-09-11 15:50
Core Viewpoint - The article discusses the significance of Nvidia's Rubin CPX, highlighting its tailored design for AI model inference, particularly addressing the inefficiencies in hardware utilization during the prefill and decode stages of AI processing [1][2][3]. Group 1: AI Inference Dilemma - The key contradiction in AI large model inference lies between the prefill and decode stages, which have opposing hardware requirements [2]. - Prefill requires high computational power but low memory bandwidth, while decode relies on high memory bandwidth with lower computational needs [3]. Group 2: Rubin CPX Configuration - Rubin CPX is designed specifically for the prefill stage, optimizing cost and performance by using GDDR7 instead of HBM, significantly reducing BOM costs to 25% of R200 while providing 60% of its computational power [4][6]. - The memory bandwidth utilization during prefill tasks is drastically improved, with Rubin CPX achieving 4.2% utilization compared to R200's 0.7% [7]. Group 3: Oberon Rack Innovations - Nvidia introduced the third-generation Oberon architecture, featuring a cable-free design that enhances reliability and space efficiency [9]. - The new rack employs a 100% liquid cooling solution to manage the increased power demands, with a power budget of 370kW [10]. Group 4: Competitive Landscape - Nvidia's advancements have intensified competition, particularly affecting AMD, Google, and AWS, as they must adapt their strategies to keep pace with Nvidia's innovations [13][14]. - The introduction of specialized chips for prefill and potential future developments in decode chips could further solidify Nvidia's market position [14]. Group 5: Future Implications - The demand for GDDR7 is expected to surge due to its use in Rubin CPX, with Samsung poised to benefit from increased orders [15][16]. - The article suggests that companies developing custom ASIC chips may face challenges in keeping up with Nvidia's rapid advancements in specialized hardware [14].
摩根士丹利:AI ASIC-协调 Trainium2 芯片的出货量
摩根· 2025-07-11 01:13
Investment Rating - The industry investment rating is classified as In-Line [8]. Core Insights - The report addresses the mismatch in AWS Trainium2/2.5 chip shipments attributed to unstable PCB yield rates, with an expectation of approximately 1.1 million chip shipments in 2025 [1][3]. - Supply chain checks estimate total shipments for the Trainium2/2.5 life cycle (2H24 to 1H26) at 1.9 million units, with a focus on production and consumption in 2025 [2][11]. - The report highlights a significant gap between upstream chip production and downstream consumption, suggesting improvements in yield rates may reduce this gap by 2H25 [6][11]. Upstream - Chip Output Perspective - As of late 2024, 0.3 million units of Trainium2 chips were produced, with a projected total of 1.1 million shipments in 2025, primarily packaged by TSMC (70%) and ASE (30%) [3][11]. - An additional 0.5 million Trainium2.5 chips are expected to be produced in 1H26, bringing the total life cycle shipments to 1.9 million units [3]. Midstream - PCB Perspective - Downstream checks indicate potential shipments exceeding 1.8 million units of Trainium chips, averaging around 200K per month since April [4][11]. - Key suppliers for PCB boards include Gold Circuit and King Slide, which provide essential components for Trainium computing trays [4]. Downstream - Server Rack System Perspective - Wiwynn is identified as a key supplier for server rack assembly, with revenue from AWS Trainium2 servers increasing in 1Q25, aligning with the upstream chip production estimates [5][11]. - The report notes that each server rack can accommodate 32 chips, supporting the projected consumption figures [5]. Component Suppliers - Major suppliers for Trainium2 AI ASIC servers include AVC for thermal solutions, Lite-On Tech for power supply, and Samsung for memory components [10][18]. - Other notable suppliers include King Slide for rail kits and Bizlink for interconnect solutions [10][18]. Future Projections - For Trainium3, shipments are estimated at 650K for 2026, with production managed by Alchip [12][13]. - The report anticipates that Trainium4 will enter small production by late 2027, with a rapid ramp-up expected in 2028 [14].