Workflow
见证历史!DeepSeek 跃居全球第二 AI 实验室,R1 登顶开源王座,R2 全网催更
程序员的那些事·2025-06-01 02:04

Core Viewpoint - DeepSeek has officially announced the completion of the R1-0528 upgrade, which significantly enhances its model performance, making it a leading open-source AI model and the second-largest AI laboratory globally [1][9][46]. Performance Enhancements - The upgraded DeepSeek-R1-0528 model exhibits performance comparable to top models like o3 and Gemini 2.5 Pro in various benchmark tests, particularly in mathematics, programming, and general logic [2][15]. - The model's accuracy in complex reasoning tasks has improved significantly, with AIME 2025 test accuracy rising from 70% to 87.5% [16]. - In benchmark tests, DeepSeek-R1-0528 achieved notable scores, such as 91.4% in AIME 2024 and 87.5% in AIME 2025 [17]. Reduction in Hallucination Rate - The hallucination rate of DeepSeek-R1-0528 has been reduced by 45%-50% compared to its predecessor, addressing previous concerns about high hallucination rates [20][24]. - This improvement allows the model to provide more accurate and reliable results in tasks such as summarization and reading comprehension [25][26]. Enhanced Functionality - DeepSeek-R1-0528 supports tool calls, enabling it to summarize articles by fetching content from links, achieving competitive scores in Tau-Bench [31]. - The model's front-end code generation capabilities have been enhanced, allowing for the rapid creation of applications with comprehensive features [33]. Distillation of Qwen3-8B - Alongside the R1 upgrade, DeepSeek has distilled the R1-0528 model's reasoning chain into a new version, DeepSeek-R1-0528-Qwen3-8B, which shows strong performance in mathematical tests, surpassing Qwen3-8B [6][37]. - The Qwen3-8B model, despite having significantly fewer parameters, demonstrates competitive performance, indicating the effectiveness of the distillation process [38]. Industry Positioning - Following the R1 upgrade, DeepSeek has been recognized as the second-largest AI laboratory globally, surpassing competitors like xAI, Meta, and Anthropic [44][46]. - The model's intelligence index score has increased from 60 to 68, reflecting a significant advancement comparable to OpenAI's improvements [46][47].