Core Insights - In February, China's AI model API call volume surged, surpassing that of the United States for the first time, with 41.2 trillion tokens compared to the U.S.'s 29.4 trillion tokens during the week of February 9-15 [2][9] - The following week, China's model call volume increased to 51.6 trillion tokens, marking a 127% growth over three weeks, while U.S. model calls dropped to 27 trillion tokens [2][9] - Four out of the top five models in global API call volume are from Chinese manufacturers, indicating a collective rise rather than reliance on a single product [2][12] Token Call Volume Growth - The OpenRouter platform, which aggregates AI models, reported a dramatic increase in global model token call volume, rising from 12.4 trillion tokens in early March 2025 to 139.5 trillion tokens by mid-February 2026, a growth of over tenfold in less than a year [8] - In early February 2026, Chinese models accounted for a significant increase in call volume, signaling a shift in market dynamics [8][9] Competitive Landscape - The top five models by call volume during the week of February 16-22, 2026, included four from Chinese companies, contributing 85.7% of the total call volume [12] - MiniMax's M2.5 model quickly became the top model within a week of its launch, contributing 14.4 trillion tokens to the total call volume [12][15] Cost Advantages - Chinese models, such as MiniMax's M2.5 and Zhiyu's GLM-5, offer significant cost advantages, with input costs at $0.30 per million tokens compared to $5 for U.S. counterparts like Claude Opus 4.6, making them approximately 16.7 times cheaper [18][19] - The output costs for Chinese models are also significantly lower, with MiniMax's M2.5 at $1.10 per million tokens versus $25 for Claude Opus 4.6, highlighting a cost disparity that influences developer choices [18][19] Technological Innovations - The "Mixture-of-Experts" (MoE) architecture is a key factor in reducing inference costs for Chinese models, allowing for efficient resource utilization by activating only relevant parts of the model for specific tasks [20] - This architecture can reduce memory usage by 60% and increase throughput by up to 19 times, contributing to the overall cost advantage [20] Market Trends - The demand for AI tokens is expected to grow exponentially, with a projected compound annual growth rate of 330% from 2025 to 2030 in China, indicating a significant market opportunity [21] - The evolution of AI from a simple Q&A tool to a productivity tool is driving increased token consumption, as users engage in more complex tasks [22][23] Future Pricing Models - The pricing of AI services is anticipated to shift towards a more customized and flexible model, influenced by task complexity and resource consumption, moving away from a one-size-fits-all approach [24]
2月井喷!中国AI调用量首超美国,四款大模型霸榜全球前五,国产算力需求正经历指数级增长