Model Generalization
Search documents
DeepSeek,上新
Zhong Guo Zheng Quan Bao· 2025-12-01 14:48
Core Insights - DeepSeek has released two new models: DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, aimed at enhancing reasoning capabilities and output length for various applications [1][2]. Model Performance - DeepSeek-V3.2 achieved performance comparable to GPT-5 and slightly below Gemini-3.0-Pro in public reasoning benchmarks, while significantly reducing output length compared to Kimi-K2-Thinking, thus lowering computational costs and user wait times [1][3]. - The DeepSeek-V3.2-Speciale model demonstrated exceptional instruction-following, rigorous mathematical proof, and logical validation capabilities, achieving gold medal-level performance in major competitions such as IMO 2025 and ICPC World Finals 2025 [2][3]. Benchmark Comparisons - In various benchmark tests, DeepSeek-V3.2-Speciale outperformed standard versions and other models, with notable scores in AIME 2025 (96.0) and HMMT Feb 2025 (99.2), while also achieving high rankings in IMOAnswerBench and LiveCodeBench [3]. - The performance of DeepSeek-V3.2-Speciale in complex tasks was significantly better than the standard version, although it required more tokens, indicating higher operational costs [3]. Model Features - DeepSeek-V3.2 is the first model to integrate reasoning with tool usage, supporting both reasoning and non-reasoning modes for tool invocation, enhancing its versatility [4]. - The model has improved generalization capabilities through a novel large-scale agent training data synthesis method, allowing it to perform well in real-world applications [4].