暴降 90%！英伟达 Blackwell 压缩 AI 推理成本至1/10

Core Insights - Nvidia has made significant progress in AI inference with its Blackwell architecture, achieving a milestone in "token economics" [1] - The company has implemented an "extreme hardware-software co-design" strategy, optimizing hardware efficiency for complex AI inference workloads, reducing the cost of token generation to one-tenth compared to the previous Hopper architecture [1] Industry Applications - Several inference service providers, including Baseten, DeepInfra, Fireworks AI, and Together AI, are utilizing the Blackwell platform to host open-source models [2] - These companies have successfully achieved cross-industry cost reductions by combining cutting-edge open-source intelligent models, Blackwell's hardware advantages, and their own optimized inference stacks [2] - For instance, Sentient Labs, focusing on multi-agent workflows, reported a cost efficiency improvement of 25% to 50% compared to the Hopper era, while companies in the gaming sector, like Latitude, have achieved lower latency and more reliable responses [2] Technical Specifications - The core of Blackwell's efficiency lies in its flagship system, the GB200 NVL72, which features a configuration of 72 interconnected chips and up to 30TB of high-speed shared memory [6][7] - This design is well-suited for the current mainstream "Mixture of Experts (MoE)" architecture, allowing for efficient splitting and parallel processing of token batches across multiple GPUs [6][7]