微软甩出3nm自研AI芯片！算力超10PFLOPS，干翻AWS谷歌

Core Viewpoint - Microsoft has announced the launch of its self-developed AI inference chip, Maia 200, claiming it to be the highest-performing self-developed chip in all large-scale data centers, aimed at significantly enhancing the economic efficiency of AI token generation [5]. Technical Specifications - Maia 200 is manufactured using TSMC's 3nm process and features over 140 billion transistors, with a memory subsystem that includes 216GB of HBM3e and a read/write speed of 7TB/s [5]. - The chip is designed for low-precision computing models, providing over 10 PFLOPS performance at FP4 precision and over 5 PFLOPS at FP8 precision, all within a 750W SoC TDP range [5]. - Its FP4 performance exceeds that of Amazon's AWS Trainium3 by more than three times, while its FP8 performance surpasses Google's TPU v7 [6]. Memory and Interconnect - The redesigned memory subsystem focuses on narrow precision data types and includes a dedicated DMA engine and on-chip SRAM, enhancing token throughput [8]. - Maia 200 offers a bidirectional bandwidth of 2.8TB/s, which is higher than AWS Trainium3's 2.56TB/s and Google's TPU v7's 1.2TB/s [9]. Performance and Efficiency - Maia 200 is the most efficient inference system deployed by Microsoft to date, with a performance improvement of 30% per dollar compared to the latest generation of hardware currently in use [10]. - The chip can run the largest models available today and is designed to support future models, including OpenAI's latest GPT-5.2 [11][12]. Integration and Development - Maia 200 integrates seamlessly with Microsoft Azure, and a software development kit (SDK) is in preview, providing tools for building and optimizing models [13]. - The architecture simplifies programming and enhances workload flexibility while reducing idle capacity, maintaining consistent performance and cost-effectiveness at cloud scale [21][22]. Deployment and Scalability - The deployment time for Maia 200 chips is halved compared to similar AI infrastructure projects, allowing AI models to run shortly after the first chips arrive [23]. - The architecture is designed for scalable performance in dense inference clusters while reducing power consumption and total cost of ownership for Azure's global clusters [22]. Future Outlook - Microsoft is positioning Maia 200 as a solution for the next generation of AI systems, aiming to set new benchmarks for performance and efficiency in critical AI workloads [28]. - The company invites developers, AI startups, and academia to explore early model and workload optimization using the new Maia 200 SDK [29].