Core Insights - The article highlights a significant shift in the AI industry, focusing on the emerging competition in AI inference chips, which is expected to grow rapidly, with the global AI inference market projected to reach $150 billion by 2028, growing at a compound annual growth rate (CAGR) of over 40% [3][4]. Group 1: Huawei's Ascend 950PR - Huawei announced its Ascend 950 series, including the Ascend 950PR and 950DT chips, designed for AI inference, with a focus on cost optimization through the use of low-cost HBM (High Bandwidth Memory) [3][4]. - The Ascend 950PR targets the inference prefill stage and recommendation services, significantly reducing investment costs, as memory costs account for over 40% of total expenses in AI inference [4]. - Huawei plans to double the computing power approximately every year, aiming to meet the growing demand for AI computing power [3]. Group 2: NVIDIA's Rubin CPX - NVIDIA launched the Rubin CPX, a GPU designed for large-scale context processing, marking its transition from a training leader to an inference expert [5][8]. - The Rubin CPX boasts a computing power of 8 Exaflops, with a 7.5 times improvement over its predecessor, and features 100TB of fast memory and 1.7PB/s bandwidth [5][8]. - This chip supports low-precision data formats, enhancing training efficiency and inference throughput, and is expected to solidify NVIDIA's dominance in the AI ecosystem [9]. Group 3: Google's Ironwood TPU - Google introduced the Ironwood TPU, which has seen a geometric increase in inference request volume, with a 50-fold growth in token usage from April 2024 to April 2025 [10][13]. - The Ironwood TPU features a single-chip peak performance of 4.614 Exaflops and a memory bandwidth of 7.4 TB/s, significantly enhancing efficiency and scalability [17][20]. - Google aims to reduce inference latency by up to 96% and increase throughput by 40% through its software stack optimizations [24]. Group 4: Groq's Rise - Groq, an AI startup specializing in inference chips, recently raised $750 million, increasing its valuation from $2.8 billion to $6.9 billion within a year [25][26]. - The company plans to deploy over 108,000 LPU (Language Processing Units) by Q1 2025 to meet demand, highlighting the growing interest in AI inference solutions [26][27]. - Groq's chips utilize a novel "tensor flow" architecture, offering ten times lower latency compared to leading GPU competitors, making them suitable for real-time AI inference [27]. Group 5: Industry Implications - The competition in AI inference chips is intensifying, with a focus not only on raw computing power but also on cost, energy efficiency, software ecosystems, and application scenarios [28]. - As AI transitions from experimental phases to everyday applications, the ability to provide efficient, economical, and flexible inference solutions will be crucial for companies to succeed in the AI era [28].
一颗芯片的新战争