稀疏注意力(DSA)
Search documents
OpenAI危,DeepSeek放大招:追平谷歌最强,手撕GPT-5 High
3 6 Ke· 2025-12-02 00:56
Core Insights - DeepSeek has officially released the V3.2 version, which significantly outperforms GPT-5 High and is on par with Google's Gemini-3.0 Pro in various reasoning benchmarks [1][4][9] - The new model has achieved four international competition gold medal-level results, showcasing its advanced capabilities [2][5] - DeepSeek V3.2 incorporates a unique DSA (Sparse Attention) architecture, breaking the "impossible triangle" of speed, cost, and intelligence in AI [1][17][22] Model Performance - DeepSeek V3.2 has demonstrated superior performance in multiple benchmarks compared to other models, including GPT-5 and Gemini-3.0 [1][21] - The model's scores in key competitions include: - AIME 2025: 96.0 (DeepSeek-V3.2-Speciale) vs. 95.0 (Gemini-3.0) - HMMT Feb 2025: 99.2 (DeepSeek-V3.2-Speciale) vs. 97.5 (Gemini-3.0) [21] Model Features - DeepSeek V3.2 is the first model to integrate thinking directly into tool usage, allowing it to operate in both thinking and non-thinking modes [6][9] - The V3.2-Speciale version is designed specifically for reasoning tasks and is currently available only via API [2][4] Technological Advancements - The DSA architecture allows for a significant reduction in computational complexity, enabling the model to process large documents efficiently [16][20] - This technology has led to a remarkable increase in processing speed and a reduction in operational costs, making advanced AI capabilities more accessible [17][20] Training and Development - DeepSeek V3.2 underwent extensive training in a virtual environment, utilizing over 1,800 simulated operating systems and generating 85,000 complex instructions to enhance its problem-solving skills [13][14] - The model's evolution from the experimental version (V3.2-Exp) to the official release showcases improvements in agent capabilities and context management [8][11]