一年后，DeepSeek-R1的每token成本降到了原来的1/32

Core Insights - DeepSeek recently updated its R1 paper, expanding from 22 pages to 86 pages, providing more detailed insights into its training pipeline and data validation methods [1] Group 1: Model Specifications and Performance - DeepSeek-R1, released on January 20, 2025, features 671 billion parameters and employs a MoE architecture, significantly enhancing training efficiency [4] - The cost per token for the R1 model has decreased to 1/32 within a year of its launch, showcasing remarkable cost efficiency improvements [6][18] - NVIDIA's collaboration with DeepSeek has led to a 36-fold increase in throughput since January 2025, further reducing inference costs [18] Group 2: Technological Innovations - NVIDIA's GB200 NVL72 system, designed for high-density workloads, connects 72 Blackwell GPUs, providing up to 1800 GB/s bidirectional bandwidth [11] - The Blackwell architecture includes hardware acceleration for NVFP4 data format, enhancing precision and performance during token generation [12] - The latest NVIDIA TensorRT-LLM software significantly boosts inference performance, particularly in various input/output sequence lengths [10][14] Group 3: Performance Metrics and Enhancements - The throughput of DeepSeek-R1 has improved dramatically, with Blackwell GPUs achieving up to 2.8 times higher throughput in the last three months [17] - The use of multi-token prediction (MTP) and NVFP4 technology on the NVIDIA HGX B200 platform has led to substantial performance gains while maintaining accuracy [21][24] - Continuous optimization of the entire technology stack by NVIDIA aims to enhance the efficiency of large language models and increase token throughput across existing hardware [30]