DeepSeek
Search documents
开源最强!“拳打GPT 5”,“脚踢Gemini-3.0”,DeepSeek V3.2为何提升这么多?
美股IPO· 2025-12-01 22:29
V3.2在工具调用能力上达到当前开源模型最高水平,大幅缩小了开源模型与闭源模型的差距。作为DeepSeek首个将思考融入工具使用的模型,V3.2 在"思考模式"下仍然支持工具调用。公司通过大规模Agent训练数据合成方法,构造了1800多个环境、85000多条复杂指令的强化学习任务,大幅提升 了模型在智能体评测中的表现。 在大模型赛道逐渐从"参数竞赛"走向"能力竞赛"的当下,一个显著的变化正在发生:开源模型开始在越来越多关键能力维度上逼近、甚至冲击顶级闭源 模型。 12月1日,DeepSeek同步发布两款正式版模型—— DeepSeek-V3.2 与 DeepSeek-V3.2-Speciale ,前者在推理测试中达到GPT-5水平,仅略低于 Gemini-3.0-Pro,而后者在IMO 2025等四项国际顶级竞赛中斩获金牌。 V3.2在工具调用能力上达到当前开源模型最高水平,大幅缩小了开源模型与闭源模型的差距。 据官方介绍, V3.2是DeepSeek首个将思考融入工具使用的模型,在"思考模式"下仍然支持工具调用。该公司通过大规模Agent训练数据合成方法,构 造了1800多个环境、85000多条复杂指令的 ...
腾讯研究院AI速递 20251202
腾讯研究院· 2025-12-01 16:03
Group 1: Generative AI Developments - DeepSeek has officially released versions V3.2 and V3.2-Speciale, with V3.2 achieving reasoning capabilities at GPT-5 level and significantly reduced output length suitable for daily use and general agent tasks [1] - V3.2-Speciale is an enhanced version for long reasoning, successfully winning gold medals in IMO 2025, CMO 2025, ICPC, and IOI 2025 by integrating theorem proving capabilities [1] - The new versions incorporate thinking into tool calls, constructing over 1,800 environments and 85,000 complex instructions through large-scale agent training data synthesis, greatly enhancing generalization capabilities [1] Group 2: Image Generation Technology - Vidu has launched the Vidu Q2 image generation suite, with upgraded features including text-to-image and image editing capabilities, producing results in as fast as 5 seconds and ranking in the top four of the global image editing leaderboard [2] - The Q2 suite allows for location referencing, action replication, instruction following, and scene switching while maintaining high consistency, supporting 4K output and arbitrary aspect ratio generation [2] - Memberships are available for free until December 31, with standard and professional members receiving a monthly limit of 300 images, while flagship members enjoy unlimited generation privileges [2] Group 3: ByteDance's New Assistant - ByteDance has released a preview version of the Doubao mobile assistant, aimed at smartphone manufacturers, capable of executing complex operations across applications such as price comparison for food delivery and auto-replying to messages [3] - The assistant features a dedicated physical button and voice activation, with screen awareness capabilities to automatically read chat context and generate replies [3] - ByteDance is in talks with multiple smartphone manufacturers, with a device featuring the Doubao assistant already launched at a price of 3,499 yuan [3] Group 4: Advertising in AI Applications - Developers discovered multiple advertising-related references in the ChatGPT Android app's beta code, including terms like "ads feature" and "search ads carousel" [4] - OpenAI's stance on advertising has shifted three times in a year, from viewing it as a "last resort" to a more accepting attitude [4] - HSBC estimates that OpenAI's operational costs for maintaining computational infrastructure could reach several hundred billion dollars annually, predicting continued losses exceeding 100 billion dollars by 2029 [4] Group 5: AI in Mathematics - The AI mathematician "Aristotle," developed by HarmonicMath, independently solved a simplified version of the Erdős problem 124 in just 6 hours, with verification in the Lean proof system taking only 1 minute [5][6] - This AI combines reinforcement learning, Monte Carlo tree search, and Lean formal language to explore millions of proof strategies, outputting 100% verifiable theorems, outperforming ChatGPT and Gemini [6] - Mathematician Terence Tao noted that AI is currently addressing the "low-hanging fruit" in mathematics, allowing human mathematicians to focus on more significant challenges [6] Group 6: Automation and Workforce Impact - A McKinsey report indicates that existing technology could theoretically automate 57% of work hours in the U.S., with agents taking 44% and robots handling 13% [7] - The report categorizes jobs into seven archetypes, predicting that 25% to 33% of the most sought-after skills will be automated in the future [7] - By 2030, redesigning workflows to allow agents to handle cognitive tasks and robots to manage physical tasks could release approximately 2.9 trillion dollars in economic value annually in the U.S. [7] Group 7: AI Companies' Pricing Strategies - Stripe's analysis reveals that about 80% of the top 10% fastest-growing AI companies utilize tiered pricing, with a likelihood of usage-based pricing nearly double that of other companies [8] - High-growth companies often offer at least 10 SKU product units, actively expanding into global markets and supporting local currency transactions to enhance conversion rates [8] - These companies are quick to respond to market demand changes, offering situational discounts and flexibly adjusting monetization models and pricing strategies based on user preferences [8] Group 8: Evolution of AI Technology - Since its launch on December 1, 2022, ChatGPT has evolved from an initial phase of wonder and hallucination to a period of multimodal capabilities and application explosion, significantly altering human production relationships [9] - The release of Google's Gemini 3 has shifted the competitive landscape, with Gemini's mobile app monthly active users increasing from 400 million to 650 million, surpassing ChatGPT in user engagement [9] - OpenAI's partners are shouldering nearly 100 billion dollars in debt, while OpenAI itself reportedly has minimal liabilities [9]
X @Bloomberg
Bloomberg· 2025-12-01 15:12
China’s DeepSeek unveiled two new versions of an experimental AI model it released weeks ago, adding fresh capabilities the startup said would help with combining reasoning and executing certain actions autonomously https://t.co/I1LvHWK1NX ...
DeepSeek 重大发布
Zheng Quan Shi Bao· 2025-12-01 15:04
Core Insights - DeepSeek has released two official model versions: DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, with the former available on the official website, app, and API, while the latter is currently accessible only as a temporary API for community evaluation [1][3]. Model Performance - DeepSeek-V3.2 aims to balance reasoning capability and output length, making it suitable for daily use. In benchmark tests, it achieved performance comparable to GPT-5 and slightly below Gemini-3.0-Pro, with a significant reduction in output length compared to Kimi-K2-Thinking, leading to lower computational costs and reduced user wait times [3][4]. - DeepSeek-V3.2-Speciale is designed to push the limits of reasoning capabilities, serving as an enhanced version of DeepSeek-V3.2, and incorporates theorem-proving abilities from DeepSeek-Math-V2. It performed comparably to Gemini-3.0-Pro in mainstream reasoning benchmarks and won gold medals in several prestigious competitions, including IMO 2025 and ICPC World Finals 2025, achieving second and tenth place among human competitors, respectively [3][4]. Benchmark Comparisons - In various benchmark tests, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale demonstrated competitive performance: - AIME 2025: DeepSeek-V3.2 scored 93.1, while DeepSeek-V3.2-Speciale scored 96.0 [4]. - HMMT Feb 2025: DeepSeek-V3.2 scored 92.5, and DeepSeek-V3.2-Speciale scored 99.2 [4]. - IMOAnswerBench: DeepSeek-V3.2 scored 78.3, and DeepSeek-V3.2-Speciale scored 84.5 [4]. - CodeForces: DeepSeek-V3.2 scored 2386, while DeepSeek-V3.2-Speciale scored 2701 [4]. Cost Efficiency - The introduction of DeepSeek-V3.2-Exp, based on V3.1-Terminus with a new attention mechanism (DSA), has led to significant improvements in training and reasoning efficiency, resulting in a notable reduction in model costs. This cost reduction enhances the model's cost-effectiveness and potential for broader application [4].
DeepSeek 上新
Zhong Guo Zheng Quan Bao· 2025-12-01 15:04
Core Insights - DeepSeek has released two official model versions: DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, aimed at enhancing reasoning capabilities and output length for various applications [1][4] Model Performance - DeepSeek-V3.2 achieved performance comparable to GPT-5 in public reasoning benchmarks, slightly below Gemini-3.0-Pro, while significantly reducing output length compared to Kimi-K2-Thinking, thus lowering computational costs and user wait times [1][3] - The DeepSeek-V3.2-Speciale model demonstrated exceptional instruction-following, rigorous mathematical proof, and logical validation capabilities, achieving gold medal-level results in major competitions such as IMO 2025 and ICPC World Finals 2025 [2] Benchmark Comparisons - In various benchmark tests, DeepSeek-V3.2-Speciale outperformed the standard version in complex tasks, although it required significantly more tokens, indicating higher costs [3] - Specific benchmark scores include: - AIME 2025: DeepSeek-V3.2-Speciale scored 96.0, while DeepSeek-V3.2 scored 93.1 [3] - HMMT Feb 2025: DeepSeek-V3.2-Speciale scored 99.2, compared to DeepSeek-V3.2's 92.5 [3] - IMOAnswerBench: DeepSeek-V3.2-Speciale scored 84.5, while DeepSeek-V3.2 scored 78.3 [3] Model Features - DeepSeek-V3.2 is the first model to integrate reasoning with tool usage, supporting both reasoning and non-reasoning modes for tool calls, enhancing its versatility [4] - The model has improved generalization capabilities through a large-scale agent training data synthesis method, allowing it to perform well in real-world applications [4]
DeepSeek发布最强开源新品,瞄向全能Agent,给GPT-5与Gemini 3下战书
Tai Mei Ti A P P· 2025-12-01 15:03
Core Insights - DeepSeek has launched two new models, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, marking a significant advancement in AI capabilities, particularly in reasoning and output efficiency [2][3] - The V3.2 model is positioned as the strongest open-source large model, outperforming competitors in various benchmarks while significantly reducing output length and computational costs [3][4] - The V3.2 model integrates a new sparse attention mechanism (DSA) to enhance performance in long-context scenarios, while also improving the model's ability to follow instructions and generalize in complex environments [8][9] Model Performance - In benchmark tests, DeepSeek-V3.2 achieved competitive scores against models like GPT-5, Claude 4.5, and Gemini 3 Pro, with notable strengths in specific areas [4][5] - The V3.2 model demonstrated superior performance in question-and-answer scenarios, providing detailed and accurate travel recommendations through advanced tool usage [5][6] - The V3.2 Speciale model focuses on maximizing reasoning capabilities, achieving results comparable to Gemini 3.0 Pro in mainstream reasoning benchmarks, although it requires a higher token cost and is not designed for everyday use [9][10] Development Focus - DeepSeek emphasizes practical usability and generalization in its models, aiming to overcome common pitfalls in AI interactions, such as making basic common-sense errors [6][8] - The company is committed to enhancing the reasoning abilities of its models, as evidenced by the integration of advanced mathematical reasoning capabilities from the recently released DeepSeek-Math-V2 [9][10] - The competitive landscape for large models is intensifying, with major players like GPT-5 and Gemini 3 pushing the boundaries of AI capabilities, suggesting a dynamic future for AI development [10]
DeepSeek发布V3.2正式版
Xin Jing Bao· 2025-12-01 15:01
Core Insights - DeepSeek announced the release of two official model versions: DeepSeek-V3.2 and DeepSeek-V3.2-Speciale [1] Model Overview - DeepSeek-V3.2 aims to balance reasoning capability and output length, making it suitable for everyday use, such as Q&A scenarios and general agent tasks [1] - In benchmark tests for reasoning, DeepSeek-V3.2 achieved performance comparable to GPT-5, slightly below Gemini-3.0-Pro [1] - Compared to Kimi-K2-Thinking, V3.2 significantly reduced output length, leading to lower computational costs and reduced user wait times [1] Special Features - DeepSeek-V3.2-Speciale is designed to push the reasoning capabilities of open-source models to the limit, exploring the boundaries of model performance [1] - This version is an enhanced long-thinking variant of DeepSeek-V3.2, incorporating theorem-proving capabilities from DeepSeek-Math-V2 [1] - The model exhibits excellent instruction-following, rigorous mathematical proof, and logical verification abilities, performing comparably to Gemini-3.0-Pro in mainstream reasoning benchmark tests [1]
DeepSeek,上新
Zhong Guo Zheng Quan Bao· 2025-12-01 14:48
Core Insights - DeepSeek has released two new models: DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, aimed at enhancing reasoning capabilities and output length for various applications [1][2]. Model Performance - DeepSeek-V3.2 achieved performance comparable to GPT-5 and slightly below Gemini-3.0-Pro in public reasoning benchmarks, while significantly reducing output length compared to Kimi-K2-Thinking, thus lowering computational costs and user wait times [1][3]. - The DeepSeek-V3.2-Speciale model demonstrated exceptional instruction-following, rigorous mathematical proof, and logical validation capabilities, achieving gold medal-level performance in major competitions such as IMO 2025 and ICPC World Finals 2025 [2][3]. Benchmark Comparisons - In various benchmark tests, DeepSeek-V3.2-Speciale outperformed standard versions and other models, with notable scores in AIME 2025 (96.0) and HMMT Feb 2025 (99.2), while also achieving high rankings in IMOAnswerBench and LiveCodeBench [3]. - The performance of DeepSeek-V3.2-Speciale in complex tasks was significantly better than the standard version, although it required more tokens, indicating higher operational costs [3]. Model Features - DeepSeek-V3.2 is the first model to integrate reasoning with tool usage, supporting both reasoning and non-reasoning modes for tool invocation, enhancing its versatility [4]. - The model has improved generalization capabilities through a novel large-scale agent training data synthesis method, allowing it to perform well in real-world applications [4].
DeepSeek,重大发布
证券时报· 2025-12-01 14:16
Core Insights - DeepSeek has released two official model versions: DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, with the former being available on the official website, app, and API, while the latter is currently accessible only through a temporary API for community evaluation [2][4]. Model Performance - DeepSeek-V3.2 aims to balance reasoning capability and output length, achieving performance comparable to GPT-5 and slightly below Gemini-3.0-Pro in benchmark tests. It significantly reduces output length compared to Kimi-K2-Thinking, leading to lower computational costs and reduced user wait times [4]. - The Speciale version is designed to push the limits of reasoning capabilities, combining features from DeepSeek-V3.2 and DeepSeek-Math-V2, and has shown performance on par with Gemini-3.0-Pro in mainstream reasoning benchmarks [4]. Benchmark Results - In various benchmark tests, DeepSeek-V3.2-Speciale achieved notable results, including: - AIME 2025: 96.0 (23k) - HMMT Feb 2025: 99.2 (27k) - HMMT Nov 2025: 94.4 (25k) - IMOAnswerBench: 84.5 (45k) - CodeForces: 2701 (77k) - HILE: 30.6 (35k) [5]. Cost Efficiency - The introduction of the new attention mechanism DSA in DeepSeek-V3.2-Exp has led to significant improvements in training efficiency and a reduction in model costs, enhancing the model's cost-effectiveness and potential for broader application [5].
DeepSeek又上新!模型硬刚谷歌
第一财经· 2025-12-01 14:05
Core Viewpoint - DeepSeek has launched two new models, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, which are leading in reasoning capabilities globally [3]. Model Overview - DeepSeek-V3.2 aims to balance reasoning ability and output length, suitable for everyday use such as Q&A and general intelligence tasks. It has reached the level of GPT-5 in public reasoning tests, slightly below Google's Gemini3 Pro [5]. - DeepSeek-V3.2-Speciale is designed to push the reasoning capabilities of open-source models to the extreme, combining long-thinking enhancements and theorem-proving abilities from DeepSeek-Math-V2 [5]. Performance Metrics - Speciale has surpassed Google's Gemini3 Pro in several reasoning benchmark tests, including the American Mathematics Invitational, Harvard MIT Mathematics Competition, and International Mathematical Olympiad [6]. - In the AIME 2025 benchmark, Speciale scored 96.0, while Gemini-3.0 scored 95.0 [7]. - Speciale achieved gold medals in IMO, ICPC World Finals, and IOI, with ICPC and IOI scores reaching the levels of the second and tenth human competitors, respectively [6]. Limitations and Future Plans - DeepSeek acknowledges limitations compared to proprietary models like Gemini3 Pro, including a narrower breadth of world knowledge and lower token efficiency [8]. - The company plans to increase pre-training computational resources and optimize model reasoning chains to improve efficiency and fill knowledge gaps [8]. Industry Context - The gap between open-source and closed-source models is widening, with proprietary systems showing stronger performance in complex tasks [10]. - DeepSeek has introduced a sparse attention mechanism (DSA) to reduce computational complexity without sacrificing long-context performance, which has been effective in improving model performance [11]. Community Reception - The release of DeepSeek's models has been positively received in overseas social media, with comments highlighting the achievement of matching GPT-5 and Gemini3 Pro with an open-source model [11].