Workflow
刚刚,DeepSeek 重磅发布!告别“极”你太美,最大提升超 36%,V4/R2 还远吗?
程序员的那些事·2025-09-23 05:43

Core Viewpoint - The latest model DeepSeek-V3.1-Terminus has been released, showing significant improvements in various benchmarks and addressing previous issues related to output consistency and agent capabilities [2][4][30]. Summary by Sections Model Updates - DeepSeek-V3.1-Terminus has been officially confirmed and is now available across all platforms, including the official app, web, and API [6]. - The update has resolved issues such as language consistency and improved the performance of Code Agent and Search Agent [4][14]. Performance Improvements - The new model has shown enhancements in multiple benchmark tests, surpassing Gemini 2.5 Pro in several areas, particularly in Humanity's Last Exam, where the score increased by 36.48% [10][31]. - Specific benchmark results include: - Humanity's Last Exam: 21.7 (up from 15.9) - LiveCodeBench: 74.9 (up from 74.8) - SimpleQA: 96.8 (up from 93.4) - SWE-bench Verified: 68.4 (up from 66.0) [10][35]. Bug Fixes - The previous issue of random output containing the character "极" has been addressed, allowing for better performance in programming-related tasks [11][22]. - The update has also tackled the problem of language mixing, which has been a common issue in large language models [18][16]. Future Expectations - There is speculation about the upcoming DeepSeek-V4 and DeepSeek-R2 models, with users expressing anticipation for further advancements [33][39].