稀疏注意力机制(DSA)
Search documents
DeepSeek又上新!模型硬刚谷歌 承认开源与闭源差距拉大
Di Yi Cai Jing· 2025-12-01 23:13
Core Insights - DeepSeek has launched two new models, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, which are positioned to compete with leading proprietary models like GPT-5 and Gemini 3.0, showcasing significant advancements in reasoning capabilities [1][4]. Model Overview - DeepSeek-V3.2 aims to balance reasoning ability and output length, making it suitable for everyday applications such as Q&A and general intelligence tasks. It has achieved performance levels comparable to GPT-5 and is slightly below Google's Gemini 3 Pro in public reasoning tests [4]. - DeepSeek-V3.2-Speciale is designed to push the limits of reasoning capabilities, integrating enhanced long-thinking features and theorem-proving abilities from DeepSeek-Math-V2. It has surpassed Gemini 3 Pro in several reasoning benchmarks, including prestigious math competitions [4][5]. Benchmark Performance - In various benchmarks, DeepSeek models have shown competitive results: - AIME 2025: DeepSeek-V3.2 scored 93.1, while GPT-5 and Gemini-3.0 scored 94.6 and 95.0 respectively [5]. - Harvard MIT Math Competition: DeepSeek-V3.2-Speciale scored 92.5, outperforming Gemini 3 Pro's 97.5 [5]. - International Math Olympiad: DeepSeek-V3.2-Speciale scored 78.3, close to Gemini 3 Pro's 83.3 [5]. Limitations and Future Plans - Despite these achievements, DeepSeek acknowledges limitations compared to proprietary models, including narrower world knowledge and lower token efficiency. The team plans to enhance pre-training and optimize reasoning chains to improve model performance [6][7]. - DeepSeek has identified three key areas where open-source models lag behind proprietary ones: reliance on standard attention mechanisms, insufficient computational resources during post-training, and gaps in generalization and instruction-following capabilities [7]. Technological Innovations - DeepSeek has introduced a sparse attention mechanism (DSA) to reduce computational complexity without sacrificing long-context performance. This innovation has been integrated into the new models, contributing to significant performance improvements [7]. Availability - The official website, app, and API for DeepSeek-V3.2 have been updated, while the enhanced Speciale version is currently available only through a temporary API for community evaluation [8]. Community Reception - The release has been positively received in social media, with users noting that DeepSeek's models have effectively matched the capabilities of GPT-5 and Gemini 3 Pro, highlighting the importance of rigorous engineering design over sheer parameter size [9].
DeepSeek又上新!模型硬刚谷歌,承认开源与闭源差距拉大
Di Yi Cai Jing· 2025-12-01 13:31
Core Insights - DeepSeek has launched two new models, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, which are leading in reasoning capabilities globally [1][3]. Model Overview - DeepSeek-V3.2 aims to balance reasoning ability and output length, suitable for everyday use such as Q&A and general intelligence tasks. It has reached the level of GPT-5 in public reasoning tests, slightly below Google's Gemini3 Pro [3]. - DeepSeek-V3.2-Speciale is designed to push the reasoning capabilities of open-source models to the extreme, combining features from DeepSeek-Math-V2 for theorem proving, and excels in instruction following and logical verification [3][4]. Performance Metrics - Speciale has surpassed Google's Gemini3 Pro in several reasoning benchmark tests, including the American Mathematics Invitational, Harvard MIT Mathematics Competition, and International Mathematical Olympiad [4]. - In various benchmarks, DeepSeek's performance is competitive, with specific scores noted in a comparative table against GPT-5 and Gemini-3.0 [5]. Technical Limitations - Despite achievements, DeepSeek acknowledges limitations compared to proprietary models like Gemini3 Pro, particularly in knowledge breadth and token efficiency [6]. - The company plans to enhance pre-training computation and optimize reasoning chains to improve model efficiency and capabilities [6][7]. Mechanism Innovations - DeepSeek introduced a Sparse Attention Mechanism (DSA) to reduce computational complexity, which has proven effective in enhancing performance without sacrificing long-context capabilities [7][8]. - Both new models incorporate this mechanism, making DeepSeek-V3.2 a cost-effective alternative that narrows the performance gap with proprietary models [8]. Community Reception - The release has been positively received in the community, with users noting that DeepSeek's models are now comparable to GPT-5 and Gemini3 Pro, marking a significant achievement in open-source model development [8].