Workflow
DeepSeek R1幻觉率降低,用户喊话:想要R2
第一财经·2025-05-29 15:13

Core Viewpoint - The updated DeepSeek R1 model has significantly improved its capabilities, particularly in reducing hallucination rates and enhancing performance in complex reasoning tasks, positioning itself competitively against leading international models [2][9][12]. Group 1: Model Improvements - The new R1 model has reduced hallucination rates by approximately 45%-50% compared to the previous version, improving accuracy in tasks such as rewriting, summarization, and reading comprehension [9][12]. - In the AIME 2025 test, the model's accuracy increased from 70% to 87.5%, showcasing its enhanced mathematical reasoning abilities [12]. - The updated model is capable of generating longer and more structured written works, aligning more closely with human writing preferences [12]. Group 2: Benchmark Performance - The updated R1 model achieved top scores in various benchmark tests, outperforming all domestic models and nearing the performance of international leaders like o3 and Gemini-2.5-Pro [9][12]. - The model's performance in coding tasks has also improved significantly, nearly matching the capabilities of OpenAI's o3-high model [12]. Group 3: Technical Specifications - The new R1 model has 685 billion parameters and supports a context length of 128K in the open-source version, with 64K available in web, app, and API formats [13]. - The model continues to utilize the DeepSeek V3 Base model as its foundation, with enhanced computational resources applied during the training process to improve reasoning depth [12][13].