MICROSOFT-微软甩出3nm自研AI芯片，算力超10PFLOPS，干翻AWS谷歌

Core Insights - Microsoft has launched its self-developed AI inference chip, Maia 200, claiming it to be the highest-performing self-developed chip in all large-scale data centers, aimed at significantly enhancing the economic efficiency of AI token generation [1] Group 1: Chip Specifications - Maia 200 is manufactured using TSMC's 3nm process and features over 140 billion transistors, with a redesigned memory subsystem that includes 216GB HBM3e (with read/write speeds of up to 7TB/s) and 272MB on-chip SRAM [1][2] - The chip is designed for low-precision computing, providing over 10 PFLOPS performance at FP4 precision and over 5 PFLOPS at FP8 precision, all within a 750W SoC TDP [1] - Its FP4 performance exceeds that of Amazon's AWS Trainium3 by more than three times, while its FP8 performance surpasses Google's TPU v7 [1][2] Group 2: Memory and Interconnect - Maia 200's memory subsystem is optimized for narrow precision data types, featuring a dedicated DMA engine and a specialized on-chip network architecture to enhance token throughput [2] - The chip offers a bidirectional bandwidth of 2.8TB/s, outperforming AWS Trainium3's 2.56TB/s and Google TPU v7's 1.2TB/s [3] Group 3: Performance and Efficiency - Maia 200 is Microsoft's most efficient inference system to date, achieving a 30% improvement in performance per dollar compared to the latest generation of hardware currently deployed by Microsoft [3] - The chip can run the largest models available today and is designed to support future models, including OpenAI's latest GPT-5.2, enhancing the cost-effectiveness for Microsoft Foundry and Microsoft 365 Copilot [4] Group 4: Integration and Deployment - Maia 200 integrates seamlessly with Microsoft Azure, and a software development kit (SDK) is in preview, providing tools for building and optimizing models on Maia 200 [6] - The chip's deployment time is reduced by more than half compared to similar AI infrastructure projects, leading to higher resource utilization and faster production delivery [10] - The architecture allows for scalable performance while reducing power consumption and total cost of ownership for Azure's global clusters [9][12] Group 5: Future Outlook - Microsoft is positioning Maia 200 as a foundational element for future generations of AI systems, inviting developers and researchers to explore early model and workload optimization using the new SDK [13]