Summary of Google Ironwood TPU Presentation at Hot Chips 2025 Company and Industry - Company: Google - Industry: Artificial Intelligence (AI) and Machine Learning (ML) Hardware Key Points and Arguments 1. Introduction of Ironwood TPU: Google introduced the Ironwood TPU, designed specifically for large-scale AI inference, marking a shift from AI training to inference capabilities [1][2][4] 2. Performance Metrics: Ironwood can scale up to 42.5 Exaflops with up to 9,216 chips in a node, achieving 2x performance-per-watt compared to the previous generation TPU, Trillium [1][2] 3. Innovations in Architecture: The Ironwood TPU features significant innovations, including the use of optical circuit switches (OCS) for memory sharing, allowing for a larger number of chips and improved reliability [2][3] 4. Memory Capacity: The system boasts 1.77 PB of directly addressable high-bandwidth memory (HBM), setting a new record for shared memory availability [2][3] 5. Focus on Reliability: Emphasis on RAS (Reliability, Availability, and Serviceability) features to ensure long-term error-free operation of cloud TPU instances [2][4] 6. Power Efficiency: The Ironwood TPU claims a nearly 6x improvement in performance-per-watt compared to TPUv4, highlighting a strong focus on power efficiency [2][3] 7. Liquid Cooling Infrastructure: The third generation of liquid cooling technology is integrated into Ironwood, ensuring optimal thermal management [2][3] 8. AI-Driven Design: AI was utilized in the design of the ALU circuits and optimization of the chip layout, showcasing a full-circle approach in AI chip development [2][3] 9. Scalability: Ironwood supports both scale-up and scale-out capabilities, with the potential to connect multiple SuperPods for extensive computational power [3][4] Other Important but Possibly Overlooked Content 1. Checkpointing and Node Management: The OCS technology allows for the reconfiguration of nodes and recovery from checkpoints, enhancing system resilience [2] 2. Integration of Security Features: Ironwood includes features for confidential computing, such as secure boot and integrated root of trust [3] 3. Market Positioning: Google is positioning Ironwood as a leading solution in the AI hardware market, focusing on high-end compute capabilities and infrastructure innovation [5] This summary encapsulates the critical insights from Google's presentation on the Ironwood TPU at Hot Chips 2025, highlighting its advancements in AI inference technology and overall system performance.
谷歌Ironwood TPU:2025 年 Hot Chips 大会剑指推理模型领军地位