DeepSeek 上新

Core Insights - DeepSeek has released two official model versions: DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, aimed at enhancing reasoning capabilities and output length for various applications [1][4] Model Performance - DeepSeek-V3.2 achieved performance comparable to GPT-5 in public reasoning benchmarks, slightly below Gemini-3.0-Pro, while significantly reducing output length compared to Kimi-K2-Thinking, thus lowering computational costs and user wait times [1][3] - The DeepSeek-V3.2-Speciale model demonstrated exceptional instruction-following, rigorous mathematical proof, and logical validation capabilities, achieving gold medal-level results in major competitions such as IMO 2025 and ICPC World Finals 2025 [2] Benchmark Comparisons - In various benchmark tests, DeepSeek-V3.2-Speciale outperformed the standard version in complex tasks, although it required significantly more tokens, indicating higher costs [3] - Specific benchmark scores include: - AIME 2025: DeepSeek-V3.2-Speciale scored 96.0, while DeepSeek-V3.2 scored 93.1 [3] - HMMT Feb 2025: DeepSeek-V3.2-Speciale scored 99.2, compared to DeepSeek-V3.2's 92.5 [3] - IMOAnswerBench: DeepSeek-V3.2-Speciale scored 84.5, while DeepSeek-V3.2 scored 78.3 [3] Model Features - DeepSeek-V3.2 is the first model to integrate reasoning with tool usage, supporting both reasoning and non-reasoning modes for tool calls, enhancing its versatility [4] - The model has improved generalization capabilities through a large-scale agent training data synthesis method, allowing it to perform well in real-world applications [4]

Seek .-DeepSeek 上新 - Reportify