跨节点的专家咨询（EP） - filings, earnings calls, financial reports, news

跨节点的专家咨询（EP）

Search documents

36氪· 2025-03-03 09:03

Core Insights - DeepSeek has revealed its operational costs and theoretical revenue during its open-source week, indicating a daily total cost of $87,072 and a potential revenue of $562,027, leading to a theoretical profit margin of 545% [4][11][12] - However, actual revenue is significantly lower due to lower pricing for DeepSeek-V3 compared to R1, free access to web and app services, and discounts during off-peak hours [12] Cost and Revenue Analysis - Daily total cost is calculated at $87,072, assuming a rental cost of $2 per hour for each H800 GPU [5][11] - The theoretical daily revenue, if all tokens were charged at DeepSeek-R1 rates, would be $562,027, resulting in a theoretical net profit of $474,955 [11][12] - Actual revenue is impacted by various factors, including lower pricing for DeepSeek-V3 and limited monetization of services [12] System Architecture and Performance - DeepSeek employs a cross-node expert parallelism (EP) strategy to enhance throughput and reduce latency, addressing the complexity introduced by EP [2][15] - The system achieved a peak node utilization of 278 and an average utilization of 226.75 nodes during the 24-hour period analyzed [5] - Total input tokens processed were 608 billion, with 56.3% hitting the KVCache [7] Technical Specifications - Each H800 node provides an average input throughput of approximately 73.7k tokens per second during the prefill phase and 14.8k tokens per second during decoding [9] - The system utilizes a combination of FP8 and BF16 formats for matrix calculations and dispatch transmissions to ensure service quality [5] Load Balancing Strategies - DeepSeek implements load balancing across GPUs to prevent performance bottlenecks, ensuring equitable distribution of computational and communication loads [22][23] - The optimization goals include balancing core-attention computation loads and dispatch sending volumes across different GPUs [23][24] - The expert parallel load balancer aims to minimize the maximum dispatch reception load across all GPUs [26]

跨节点的专家咨询（EP）

Data Parallelism（DP）

Artificial Intelligence

Artificial Intelligence

DeepSeek-V3/R1推理系统

DeepSeek-V3

DeepSeek-R1