Core Insights - Nvidia is shifting the AI computing competition focus from training to inference, integrating LPU technology and collaborating with OpenAI for dedicated inference capabilities [1][2] - The demand for inference computing is surging, driven by the monetization of large models and the acceleration of agent deployment in real-world applications [3][6] Group 1: Inference Computing Trends - The report identifies four major trends in inference computing: increased deployment of pure CPU scenarios, the rise of specialized architectures like LPU challenging GPU dominance, accelerated breakthroughs in domestic computing chips, and a shift in demand structure from single training to mass token consumption [2][10] - Companies providing high-performance, cost-effective inference chips will benefit the most, as breakthroughs in CPU, LPU, and domestic chips reshape the computing landscape [2][10] Group 2: Demand and Usage Statistics - The demand for inference has exploded, with significant increases in token consumption during the Chinese New Year, including 63.3 billion tokens processed in a single day by a leading model [3][10] - Data from OpenRouter indicates that Chinese models surpassed U.S. models in token calls, with a notable increase of 127% in three weeks, highlighting the growing prominence of Chinese AI models [3][10] Group 3: Technological Developments - Nvidia's acquisition of Groq's core technology for $20 billion signifies the recognition of pure inference chips' importance by top players in the industry [6][10] - The architecture of LPU differs from traditional GPUs, providing efficiency advantages in inference scenarios, particularly in addressing latency and memory bandwidth issues [6][10] Group 4: System-Level Innovations - The evolution from single chips to system-level innovations is crucial for the upgrade of inference computing, with a three-layer network architecture emerging to meet the demands of low latency and high throughput [8][10] - Nvidia is expanding its collaboration with Meta Platforms to support large-scale pure CPU deployments, indicating a shift away from a single GPU sales model [8][10] Group 5: Domestic Chip Advancements - Domestic inference chips are experiencing significant technological upgrades, including support for low-precision data formats and increased interconnect bandwidth, with expectations for a new version to launch in Q1 2026 [10] - The growth of domestic packaging companies reflects the increasing supply capability of domestic computing chips, with revenues from high-performance computing chip packaging services projected to rise significantly [10]
英伟达的“神秘芯片”背后--推理时代开启“四大算力新趋势”