大规模跨节点专家并行 - filings, earnings calls, financial reports, news

大规模跨节点专家并行

Search documents

证券时报· 2025-03-01 08:38

Core Viewpoint - DeepSeek has unveiled its V3/R1 inference system, highlighting its theoretical cost and profit margins, which indicate a potential profit margin of 545% based on its pricing model [1][5]. Group 1: Inference System Optimization - The optimization goals of the DeepSeek V3/R1 inference system are to achieve higher throughput and lower latency through a method called Expert Parallelism (EP) [2]. - By utilizing large-scale cross-node expert parallelism, DeepSeek significantly increases the batch size, enhancing GPU matrix multiplication efficiency and overall throughput [3]. - To reduce latency, the system distributes experts across different GPUs, minimizing memory access requirements [3]. Group 2: Cost and Revenue Insights - DeepSeek's inference services utilize NVIDIA's H800 GPUs, with a peak node usage of 278 and an average of 226.75 nodes, leading to a theoretical daily cost of $87,072 [4]. - The theoretical daily revenue from token processing is estimated at $562,027, resulting in a profit margin of 545% [5]. - However, actual revenue may be lower due to lower pricing for V3 compared to R1 and discounts during off-peak hours [6]. Group 3: Market Position and Future Developments - DeepSeek's open-source initiatives have garnered attention, with industry analysts noting the comprehensive technical components released during its open-source week [7]. - The company is perceived as a disruptive force in the AI industry, particularly in comparison to OpenAI's pricing strategies, which are significantly higher [10]. - There are expectations for the upcoming DeepSeek-R2 model, which may offer improved capabilities and potentially be released ahead of schedule [11].

大规模跨节点专家并行

AGI（通用人工智能）

Artificial Intelligence

Artificial Intelligence

DeepSeek-V3/R1推理系统

GPT-4.5