Workflow
谷歌Ironwood TPU
icon
Search documents
谷歌Ironwood TPU:2025 年 Hot Chips 大会剑指推理模型领军地位
2025-09-04 14:38
Summary of Google Ironwood TPU Presentation at Hot Chips 2025 Company and Industry - **Company**: Google - **Industry**: Artificial Intelligence (AI) and Machine Learning (ML) Hardware Key Points and Arguments 1. **Introduction of Ironwood TPU**: Google introduced the Ironwood TPU, designed specifically for large-scale AI inference, marking a shift from AI training to inference capabilities [1][2][4] 2. **Performance Metrics**: Ironwood can scale up to 42.5 Exaflops with up to 9,216 chips in a node, achieving 2x performance-per-watt compared to the previous generation TPU, Trillium [1][2] 3. **Innovations in Architecture**: The Ironwood TPU features significant innovations, including the use of optical circuit switches (OCS) for memory sharing, allowing for a larger number of chips and improved reliability [2][3] 4. **Memory Capacity**: The system boasts 1.77 PB of directly addressable high-bandwidth memory (HBM), setting a new record for shared memory availability [2][3] 5. **Focus on Reliability**: Emphasis on RAS (Reliability, Availability, and Serviceability) features to ensure long-term error-free operation of cloud TPU instances [2][4] 6. **Power Efficiency**: The Ironwood TPU claims a nearly 6x improvement in performance-per-watt compared to TPUv4, highlighting a strong focus on power efficiency [2][3] 7. **Liquid Cooling Infrastructure**: The third generation of liquid cooling technology is integrated into Ironwood, ensuring optimal thermal management [2][3] 8. **AI-Driven Design**: AI was utilized in the design of the ALU circuits and optimization of the chip layout, showcasing a full-circle approach in AI chip development [2][3] 9. **Scalability**: Ironwood supports both scale-up and scale-out capabilities, with the potential to connect multiple SuperPods for extensive computational power [3][4] Other Important but Possibly Overlooked Content 1. **Checkpointing and Node Management**: The OCS technology allows for the reconfiguration of nodes and recovery from checkpoints, enhancing system resilience [2] 2. **Integration of Security Features**: Ironwood includes features for confidential computing, such as secure boot and integrated root of trust [3] 3. **Market Positioning**: Google is positioning Ironwood as a leading solution in the AI hardware market, focusing on high-end compute capabilities and infrastructure innovation [5] This summary encapsulates the critical insights from Google's presentation on the Ironwood TPU at Hot Chips 2025, highlighting its advancements in AI inference technology and overall system performance.
谷歌 Ironwood TPU:在推理模型训练与推理服务领域实现一流性能、性能成本比及性能功耗比
2025-09-04 14:38
Summary of Ironwood Conference Call Company and Industry - **Company**: Ironwood - **Industry**: Machine Learning and Data Center Technology Key Points and Arguments 1. **Performance Metrics**: Ironwood's 9216 chips utilize optical circuit switches (OCS) to share memory, achieving a directly addressable shared HBM memory capacity of 1.77 PB and 42.5 Exaflops of ML compute using FP8 precision, which sets a new record for shared-memory multiprocessors [7][73] 2. **Efficiency Improvements**: The company emphasizes industry-leading compute power efficiency, reporting a 2x performance per watt (perf/W) improvement over the previous generation [7][73] 3. **Cooling Infrastructure**: Ironwood has developed a 3rd generation of liquid cooling infrastructure, which is crucial for maintaining performance in high-density environments [26][75] 4. **SparseCore Technology**: The 4th generation SparseCore technology is designed to accelerate embeddings and offload collective operations, providing a 2.4x increase in FLOPS compared to the 3rd generation [30][75] 5. **Deployment at Hyperscale**: The deployment of Ironwood technology at hyperscale is currently underway, indicating strong market demand and operational scaling capabilities [35][73] 6. **Reliability and Serviceability**: The emphasis on RAS (Reliability, Availability, and Serviceability) is highlighted as a key feature that enables productive scaling to extreme sizes [20][74] Additional Important Content 1. **Power Management**: Ironwood supports a full-stack approach to proactive power shaping, which is essential for managing unprecedented load swings during large-scale pretraining [34][67] 2. **Security Features**: The integrated root-of-trust (iROT) controller provides hardware support for secure boot and secure test/debug, enhancing the security of the computing environment [60] 3. **Market Position**: Ironwood continues to lead in both scale-up and scale-out capabilities, with a focus on maximizing ML throughput under dynamically varying power budgets [73][72] 4. **Future Outlook**: The company aims to target a 30% additional throughput per data center within the same power budget, showcasing its commitment to innovation and efficiency [72] This summary encapsulates the critical insights from the Ironwood conference call, focusing on performance, efficiency, technology advancements, and strategic positioning within the industry.
关于谷歌TPU性能大涨、Meta算力投资、光模块、以太网推动Scale Up...,一文读懂Hot Chips 2025大会要点
硬AI· 2025-09-04 08:42
Core Insights - The demand for AI infrastructure is experiencing strong growth, driven by advancements in computing, memory, and networking technologies [2][5][6] - Key trends include significant performance improvements in Google's Ironwood TPU, Meta's expansion of GPU clusters, and the rise of networking technologies as critical growth points for AI infrastructure [2][4][8] Group 1: Google Ironwood TPU - Google's Ironwood TPU (TPU v6) shows a remarkable performance leap, with peak FLOPS performance increasing by approximately 10 times compared to TPU v5p, and efficiency improving by 5.6 times [5] - Ironwood features 192GB HBM3E memory and a bandwidth of 7.3TB/s, significantly up from the previous 96GB HBM2 and 2.8TB/s bandwidth [5] - The Ironwood supercluster can scale up to 9,216 chips, providing a total of 1.77PB of directly addressable HBM memory and 42.5 exaflops of FP8 computing power [5][6] Group 2: Meta's Custom Deployment - Meta's custom NVL72 system, Catalina, features a unique architecture that doubles the number of Grace CPUs to 72, enhancing memory and cache consistency [7] - The design is tailored to meet the demands of large language models and other computationally intensive applications, while also considering physical infrastructure constraints [7] Group 3: Networking Technology - Networking technology emerged as a focal point, with significant growth opportunities in both Scale Up and Scale Out domains [10] - Broadcom introduced the 51.2TB/s Tomahawk Ultra switch, designed for low-latency HPC and AI applications, marking an important opportunity for expanding their Total Addressable Market (TAM) [10][11] Group 4: Optical Technology Integration - Optical technology is becoming increasingly important, with discussions on integrating optical solutions to address power and cost challenges in AI infrastructure [14] - Lightmatter showcased its Passage M1000 AI 3D photonic interconnect, which aims to enhance connectivity and performance in AI systems [14] Group 5: AMD Product Line Expansion - AMD presented details on its MI350 GPU series, with the MI355X designed for liquid-cooled data centers and the MI350X for traditional air-cooled setups [16][17] - The MI400 series is expected to launch in 2026, with strong positioning in the inference computing market, which is growing faster than the training market [18]
摩根大通:关于谷歌TPU性能大涨、Meta算力投资、光模块、以太网推动Scale Up...,一文读懂Hot Chips 大会
美股IPO· 2025-09-04 04:24
Core Insights - The demand for AI infrastructure is experiencing strong growth, driven by advancements in computing, memory, and networking technologies [3] - Key trends include significant performance improvements in Google's Ironwood TPU, Meta's expansion of GPU clusters, and the rise of networking technologies as critical growth points [3][4][6] Group 1: AI Infrastructure Demand - AI is the primary driver of technological advancement and product demand, with a strong growth momentum in AI infrastructure [3] - The competition is expanding from pure computing power to comprehensive upgrades in networking and optical technologies [3] Group 2: Google's Ironwood TPU - Google's Ironwood TPU (TPU v6) shows a performance leap with a peak FLOPS performance increase of approximately 10 times compared to TPU v5p, and a 5.6 times improvement in efficiency [4] - Ironwood features 192GB HBM3E memory and a bandwidth of 7.3TB/s, significantly enhancing storage capacity and bandwidth [4] - The Ironwood supercluster can scale up to 9,216 chips, providing a total of 1.77PB of directly addressable HBM memory and 42.5 exaflops of FP8 computing power [4] Group 3: Meta's Custom Deployment - Meta's NVL72 system, Catalina, is designed with a unique architecture that doubles the number of Grace CPUs to 72, enhancing memory and cache consistency [6] - The custom design is based on model requirements and physical infrastructure considerations, accommodating both large language models and recommendation engines [6] Group 4: Networking Technologies - Networking technology is a focal point, with significant growth opportunities in both Scale Up and Scale Out domains [8] - Broadcom introduced the 51.2TB/s Tomahawk Ultra switch, designed for low-latency HPC and AI applications [9] - Nvidia's Spectrum-XGS Ethernet technology aims to address distributed cluster challenges across multiple data centers, offering advantages over existing Ethernet solutions [11] Group 5: Optical Technology Integration - Optical technology is highlighted as a key area, with a focus on deep integration into AI infrastructure to address power and cost challenges [12] - Lightmatter's Passage M1000 aims to solve connectivity issues with a large active photonic interconnect [12] - Ayar Labs presented its TeraPHY optical I/O chip, supporting up to 8.192TB/s bidirectional bandwidth with significantly improved power efficiency [13] Group 6: AMD Product Line Expansion - AMD detailed its MI350 GPU series, with the MI355X designed for liquid-cooled data centers and the MI350X for traditional air-cooled infrastructures [14][15] - The MI355X offers a 9% performance increase over the MI350X while maintaining the same memory capacity and bandwidth [16] - AMD's MI400 series is expected to launch in 2026, with strong positioning in the inference computing market, which is growing faster than the training market [16]