AWS发布3nm芯片: 144 GB HBM3e,4.9 TB/s带宽
半导体行业观察·2025-12-03 00:44

Core Insights - AWS has officially launched its next-generation Trainium AI accelerator, Trainium3, at the AWS re:Invent conference, marking a significant advancement in AI computing capabilities [1][2] - The Trainium3 chip, manufactured using TSMC's 3nm process, offers 2.52 PFLOPs of FP8 computing power and integrates 144 GB of HBM3e memory, providing 4.9 TB/s memory bandwidth [1][2] - AWS claims that Trainium3's architecture improvements are designed to better handle modern AI workloads, including support for various floating-point formats and enhanced hardware support for structured sparsity and collective communication [1][2] Chip Features - Trainium3 introduces NeuronSwitch-v1, a new fully connected architecture that allows for the connection of up to 144 chips within a single UltraServer, doubling the inter-chip bandwidth compared to the previous generation [3] - The upgraded Neuron Fabric reduces inter-chip communication latency to just below 10 microseconds, facilitating large-scale distributed training jobs across thousands of Trainium chips [3] System-Level Enhancements - A fully configured Trainium3 UltraServer can connect 144 chips, aggregating 362 FP8 PFLOPs of computing power, 20.7 TB of HBM3e memory, and 706 TB/s memory bandwidth, resulting in up to 4.4 times the computing performance and 4 times the energy efficiency compared to the previous generation [2][4] - Internal tests on OpenAI's GPT-OSS model showed that Trainium3 achieved a threefold increase in throughput per chip and a fourfold improvement in inference response time compared to the previous UltraServer [4] Cost Efficiency and Adoption - Customers have reported up to a 50% reduction in training costs when using Trainium3 compared to alternative solutions, with early adopters exploring new applications such as real-time video generation [5] - AWS has already deployed Amazon Bedrock on Trainium3, indicating readiness for enterprise-level applications [5] Future Developments - AWS is developing Trainium4, which aims to significantly enhance computing, memory, and interconnect performance, targeting at least 6 times the FP4 throughput and 3 times the FP8 performance [5][6] - Trainium4 will integrate NVIDIA's NVLink Fusion interconnect technology, allowing for interoperability with other AWS systems and creating a flexible rack-level design [6][7] Strategic Partnerships - AWS and NVIDIA have announced a multi-generational partnership to integrate NVLink Fusion technology into future AWS AI rack and chip designs, which is a significant move for both companies [7][8] - This collaboration allows AWS to utilize NVIDIA's NVLink architecture, enhancing its custom chip projects and potentially impacting the competitive landscape in AI infrastructure [10]