Core Insights - DeepSeek's V3.2 technical report indicates that the performance gap between open-source models and closed-source models is not narrowing but rather widening, based on extensive empirical data [1][2]. Performance Comparison - In benchmark tests, DeepSeek V3.2 scored 85.0 in MMLU-Pro, while GPT-5 scored 87.5 and Gemini 3.0 Pro achieved 90.1. In the GPQA Diamond test, the scores were 82.4 for DeepSeek, 85.7 for GPT-5, and 91.9 for Gemini 3.0 Pro [2][3]. - The most significant gap was observed in the HLE test, where DeepSeek V3.2 scored 25.1, compared to GPT-5's 26.3 and Gemini 3.0 Pro's 37.7, indicating a substantial performance disparity [3][4]. Structural Issues Identified - The report identifies three structural issues limiting the capabilities of open-source models in complex tasks: 1. Architectural Limitations: Open-source models rely on traditional vanilla attention mechanisms, which are inefficient for long sequences, hindering scalability and effective post-training [6]. 2. Resource Investment Gap: The post-training budget for DeepSeek V3.2 exceeds 10% of its pre-training costs, while most open-source models allocate less than 1%, leading to significant performance differences [7]. 3. AI Agent Capability Lag: Open-source models show inferior generalization and instruction-following abilities in real-world applications, as evidenced by lower scores in key agent evaluation benchmarks [8]. DeepSeek's Strategic Innovations - DeepSeek has implemented fundamental technical innovations across three core dimensions: 1. Architectural Changes: Introduction of the DSA (DeepSeek Sparse Attention) mechanism, which reduces computational complexity from O(L²) to O(L×k), significantly lowering inference costs while maintaining performance [10]. 2. Increased Resource Allocation: DeepSeek has made an unprecedented decision to allocate substantial resources for post-training, training expert models in six key areas with a total of 943.7 billion tokens during the pre-training phase [12]. 3. Enhanced Agent Capabilities: Development of a systematic task synthesis process, creating over 1,800 diverse environments and 85,000 complex prompts, which has improved performance in agent-related tests [13]. Conclusion - DeepSeek V3.2 demonstrates a viable path for open-source AI to compete with closed-source models through innovative architecture and strategic resource allocation, suggesting that technological innovation may be the key to survival in the competitive AI landscape [14].
开源和闭源模型的差距在拉大:这是DeepSeek论文揭示的残酷真相