刚刚！DeepSeek，硬核发布！

Core Viewpoint - DeepSeek has made significant advancements in optimizing parallel computing strategies and has introduced new models that enhance performance and reduce costs in AI applications [2][3][5][7]. Group 1: Optimized Parallelism Strategies - DeepSeek announced the release of Optimized Parallelism Strategies aimed at improving computational efficiency, reducing resource waste, and maximizing system performance through effective task allocation and resource coordination [3][5]. - The strategies are designed for high-performance parallel execution in multi-core, distributed, or heterogeneous systems, balancing computation, communication, and storage overhead [5]. Group 2: New Model Releases - NVIDIA has open-sourced the first optimized DeepSeek-R1 model on the Blackwell architecture, achieving a 25-fold increase in inference speed and a 20-fold reduction in cost per token [3][7]. - The DeepSeek-R1 model's local deployment has garnered significant attention, with its inference throughput reaching 21,088 tokens per second, compared to 844 tokens per second for the H100, marking a substantial performance improvement [7]. Group 3: Cost Reduction Initiatives - DeepSeek announced a significant reduction in API call prices during nighttime hours, with DeepSeek-V3 prices cut to 50% and DeepSeek-R1 to as low as 25%, with reductions up to 75% [6]. Group 4: Additional Open Source Contributions - DeepSeek has continued its open-source initiatives by releasing FlashMLA, DeepEP, and DeepGEMM, which are optimized for NVIDIA GPUs and designed to support various AI model training and inference tasks [9].