开源模型 - filings, earnings calls, financial reports, news - Reportify

开源模型

Search documents

闭源越跑越快之后，DeepSeek V3.2 如何为开源模型杀出一条新路

深思SenseAI· 2025-12-03 09:51

Core Viewpoint - The article emphasizes that closed-source models are increasingly outperforming open-source models in complex tasks, with the performance gap widening over time [1]. Group 1: Key Issues with Open-Source Models - Open-source models face three critical issues: reliance on Vanilla Attention mechanisms limits computational efficiency in long-sequence scenarios, insufficient computational resources during post-training phases restrict performance on difficult tasks, and significant lag in generalization and instruction-following capabilities compared to closed-source systems [2]. Group 2: DeepSeek's Innovations - DeepSeek introduced two new models, DeepSeek V3.2 and DeepSeek V3.2 Speciale, which address the aforementioned issues through three improvements: the introduction of a highly efficient attention mechanism called DSA (DeepSeek Sparse Attention) to reduce computational complexity, a stable and scalable reinforcement learning protocol to significantly increase computational resources during post-training, and a new data pipeline to enhance generalization and instruction-following capabilities in AI agent scenarios [2][3]. Group 3: DSA Mechanism - The DSA mechanism reduces the complexity of core attention from O(L^2) to O(L*k), where k is much smaller than L, thus maintaining model performance even in long-context scenarios [11]. The DSA employs a two-stage sparsification mechanism that transforms full computation into selective computation, enhancing efficiency [7][10]. Group 4: Reinforcement Learning Strategy - DeepSeek V3.2 allocates over 10% of the computational budget to post-training, exceeding pre-training costs, and employs a mixed reinforcement learning approach to optimize performance [12][14]. This strategy combines reasoning, agent, and human alignment tasks into a single RL phase to mitigate catastrophic forgetting common in traditional multi-stage training [14]. Group 5: Impact on Open-Source Ecosystem - DeepSeek's advancements demonstrate that significant improvements in model performance can be achieved without relying on closed-source systems, suggesting a shift back to a more research-driven era in large model development. The company sets a precedent for the open-source community on how to innovate within limited budgets and reshape agent systems [16].

Artificial Intelligence

DeepSeek V3.2 Speciale

Artificial Intelligence

DeepSeek V3.2 Speciale

DeepSeek杀出一条血路：国产大模型突围不靠运气

3 6 Ke· 2025-12-03 03:21

进入2025年末，全球大模型赛道的技术焦点几乎被Google重新夺回。Gemini 3 Pro横空出世，在多个权威基准上超越所有开源模型，重新确立了闭源阵营的技术高地。一时间，业内关于"开源模型是否已到极限""Scaling Law是否真的撞墙"的质疑声再起，一股迟滞情绪在开源社区弥漫。但就在此时，DeepSeek没有选择沉默。12月1日，它一口气发布了两款重磅模型：推理性能对标GPT-5 的DeepSeek-V3.2，以及在数学、逻辑和多轮工具调用中表现异常强势的Speciale版本。这不仅是对技术能力的集中展示，也是在当前算力资源并不占优的前提下，对闭源"新天花板"的正面回应。这不是一次简单的模型更新。DeepSeek试图在后Scaling时代找出一条全新路径：如何用架构重塑弥补预训练差距？如何通过"工具使用中的思考链"实现低token高效率的智能体表现？更关键的是，Agent为何从附属功能变成了模型能力跃迁的核心引擎？本文将围绕这三条主线展开分析：DeepSeek是如何在技术瓶颈下突破的？为何率先在开源阵营中重注 Agent？而这是否意味着，开源模型仍有穿透闭源护城河的那条路？这背后的 ...

Seek .(US:SKLTY)

Artificial Intelligence

Artificial Intelligence

开源最强！“拳打GPT 5”，“脚踢Gemini-3.0”，DeepSeek V3.2为何提升这么多？

华尔街见闻· 2025-12-02 04:21

Core Insights - DeepSeek has released two official models, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, with the former achieving performance levels comparable to GPT-5 and the latter winning gold medals in four international competitions [1][3]. Model Performance - DeepSeek-V3.2 has reached the highest level of tool invocation capabilities among current open-source models, significantly narrowing the gap with closed-source models [2]. - In various benchmark tests, DeepSeek-V3.2 achieved a 93.1% pass rate in AIME 2025, closely trailing GPT-5's 94.6% and Gemini-3.0-Pro's 95.0% [20]. Training Strategy - The model's significant improvement is attributed to a fundamental change in training strategy, moving from a simple "direct tool invocation" to a more sophisticated "thinking + tool invocation" mechanism [9][11]. - DeepSeek has constructed a new large-scale data synthesis pipeline, generating over 1,800 environments and 85,000 complex instructions specifically for reinforcement learning [12]. Architectural Innovations - The introduction of the DeepSeek Sparse Attention (DSA) mechanism has effectively addressed efficiency bottlenecks in traditional attention mechanisms, reducing complexity from O(L²) to O(Lk) while maintaining model performance [6][7]. - The model's architecture allows for better context management, retaining relevant reasoning content during tool-related messages, thus avoiding inefficient repeated reasoning [14]. Competitive Landscape - The release of DeepSeek-V3.2 signals a shift in the competitive landscape, indicating that the absolute technical monopoly of closed-source models is being challenged by open-source models gaining first-tier competitiveness [20][22]. - This development has three implications: lower costs and greater customization for developers, reduced reliance on overseas APIs for enterprises, and a shift in the industry focus from "who has the largest parameters" to "who has the strongest methods" [22].

Artificial Intelligence

Artificial Intelligence

DeepSeek又上新！模型硬刚谷歌承认开源与闭源差距拉大

Di Yi Cai Jing· 2025-12-01 23:13

Core Insights - DeepSeek has launched two new models, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, which are positioned to compete with leading proprietary models like GPT-5 and Gemini 3.0, showcasing significant advancements in reasoning capabilities [1][4]. Model Overview - DeepSeek-V3.2 aims to balance reasoning ability and output length, making it suitable for everyday applications such as Q&A and general intelligence tasks. It has achieved performance levels comparable to GPT-5 and is slightly below Google's Gemini 3 Pro in public reasoning tests [4]. - DeepSeek-V3.2-Speciale is designed to push the limits of reasoning capabilities, integrating enhanced long-thinking features and theorem-proving abilities from DeepSeek-Math-V2. It has surpassed Gemini 3 Pro in several reasoning benchmarks, including prestigious math competitions [4][5]. Benchmark Performance - In various benchmarks, DeepSeek models have shown competitive results: - AIME 2025: DeepSeek-V3.2 scored 93.1, while GPT-5 and Gemini-3.0 scored 94.6 and 95.0 respectively [5]. - Harvard MIT Math Competition: DeepSeek-V3.2-Speciale scored 92.5, outperforming Gemini 3 Pro's 97.5 [5]. - International Math Olympiad: DeepSeek-V3.2-Speciale scored 78.3, close to Gemini 3 Pro's 83.3 [5]. Limitations and Future Plans - Despite these achievements, DeepSeek acknowledges limitations compared to proprietary models, including narrower world knowledge and lower token efficiency. The team plans to enhance pre-training and optimize reasoning chains to improve model performance [6][7]. - DeepSeek has identified three key areas where open-source models lag behind proprietary ones: reliance on standard attention mechanisms, insufficient computational resources during post-training, and gaps in generalization and instruction-following capabilities [7]. Technological Innovations - DeepSeek has introduced a sparse attention mechanism (DSA) to reduce computational complexity without sacrificing long-context performance. This innovation has been integrated into the new models, contributing to significant performance improvements [7]. Availability - The official website, app, and API for DeepSeek-V3.2 have been updated, while the enhanced Speciale version is currently available only through a temporary API for community evaluation [8]. Community Reception - The release has been positively received in social media, with users noting that DeepSeek's models have effectively matched the capabilities of GPT-5 and Gemini 3 Pro, highlighting the importance of rigorous engineering design over sheer parameter size [9].

Seek .(US:SKLTY)

稀疏注意力机制（DSA）

Artificial Intelligence

DeepSeek-V3.2-Speciale

稀疏注意力机制（DSA）

Artificial Intelligence

DeepSeek-V3.2-Speciale

开源最强！“拳打GPT 5”，“脚踢Gemini-3.0”，DeepSeek V3.2为何提升这么多？

美股IPO· 2025-12-01 22:29

V3.2在工具调用能力上达到当前开源模型最高水平，大幅缩小了开源模型与闭源模型的差距。作为DeepSeek首个将思考融入工具使用的模型，V3.2 在"思考模式"下仍然支持工具调用。公司通过大规模Agent训练数据合成方法，构造了1800多个环境、85000多条复杂指令的强化学习任务，大幅提升了模型在智能体评测中的表现。在大模型赛道逐渐从"参数竞赛"走向"能力竞赛"的当下，一个显著的变化正在发生：开源模型开始在越来越多关键能力维度上逼近、甚至冲击顶级闭源模型。 12月1日，DeepSeek同步发布两款正式版模型—— DeepSeek-V3.2 与 DeepSeek-V3.2-Speciale ，前者在推理测试中达到GPT-5水平，仅略低于 Gemini-3.0-Pro，而后者在IMO 2025等四项国际顶级竞赛中斩获金牌。 V3.2在工具调用能力上达到当前开源模型最高水平，大幅缩小了开源模型与闭源模型的差距。据官方介绍， V3.2是DeepSeek首个将思考融入工具使用的模型，在"思考模式"下仍然支持工具调用。该公司通过大规模Agent训练数据合成方法，构造了1800多个环境、85000多条复杂指令的 ...

DeepSeek-V3.2-Speciale

DeepSeek-V3.2-Speciale

DeepSeek 重要发布

Shang Hai Zheng Quan Bao· 2025-12-01 13:57

Core Insights - DeepSeek has officially released two models: DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, with updates available on the official website, app, and API [1] - DeepSeek-V3.2 aims to balance reasoning capabilities and output length, making it suitable for everyday use cases such as Q&A and general agent tasks [1] - DeepSeek-V3.2-Speciale is designed to push the reasoning capabilities of open-source models to the limit, enhancing long-thinking abilities and incorporating theorem-proving capabilities from DeepSeek-Math-V2 [1] Model Performance - The V3.2-Speciale model exhibits excellent instruction-following, rigorous mathematical proof, and logical verification capabilities, performing comparably to leading international models on mainstream reasoning benchmarks [1] - Notably, the V3.2-Speciale model has achieved gold medals in several prestigious competitions, including IMO 2025, CMO 2025, ICPC World Finals 2025, and IOI 2025 [1] - In the ICPC and IOI competitions, the model's performance reached the level of the second and tenth place among human competitors, respectively [1]

Seek .(US:SKLTY)

DeepSeek-V3.2-Speciale

DeepSeek-Math-V2

DeepSeek-V3.2-Speciale

DeepSeek-Math-V2

DeepSeek又上新！模型硬刚谷歌，承认开源与闭源差距拉大

Di Yi Cai Jing· 2025-12-01 13:31

Core Insights - DeepSeek has launched two new models, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, which are leading in reasoning capabilities globally [1][3]. Model Overview - DeepSeek-V3.2 aims to balance reasoning ability and output length, suitable for everyday use such as Q&A and general intelligence tasks. It has reached the level of GPT-5 in public reasoning tests, slightly below Google's Gemini3 Pro [3]. - DeepSeek-V3.2-Speciale is designed to push the reasoning capabilities of open-source models to the extreme, combining features from DeepSeek-Math-V2 for theorem proving, and excels in instruction following and logical verification [3][4]. Performance Metrics - Speciale has surpassed Google's Gemini3 Pro in several reasoning benchmark tests, including the American Mathematics Invitational, Harvard MIT Mathematics Competition, and International Mathematical Olympiad [4]. - In various benchmarks, DeepSeek's performance is competitive, with specific scores noted in a comparative table against GPT-5 and Gemini-3.0 [5]. Technical Limitations - Despite achievements, DeepSeek acknowledges limitations compared to proprietary models like Gemini3 Pro, particularly in knowledge breadth and token efficiency [6]. - The company plans to enhance pre-training computation and optimize reasoning chains to improve model efficiency and capabilities [6][7]. Mechanism Innovations - DeepSeek introduced a Sparse Attention Mechanism (DSA) to reduce computational complexity, which has proven effective in enhancing performance without sacrificing long-context capabilities [7][8]. - Both new models incorporate this mechanism, making DeepSeek-V3.2 a cost-effective alternative that narrows the performance gap with proprietary models [8]. Community Reception - The release has been positively received in the community, with users noting that DeepSeek's models are now comparable to GPT-5 and Gemini3 Pro, marking a significant achievement in open-source model development [8].

Seek .(US:SKLTY)

稀疏注意力机制（DSA）

Artificial Intelligence

稀疏注意力机制（DSA）

Artificial Intelligence

DeepSeek发布两个正式版模型

Zheng Quan Shi Bao Wang· 2025-12-01 11:18

Core Insights - DeepSeek has released two official model versions: DeepSeek-V3.2 and DeepSeek-V3.2-Speciale [1] - The main goal of DeepSeek-V3.2 is to balance reasoning capability with output length, making it suitable for everyday use cases such as Q&A and general agent tasks [1] - The DeepSeek-V3.2-Speciale version aims to push the reasoning capabilities of open-source models to the extreme, exploring the boundaries of model capabilities [1] Summary by Categories - **Product Launch** - DeepSeek has updated its official website, app, and API to the official version of DeepSeek-V3.2 [1] - The Speciale version is currently available only as a temporary API service for community evaluation and research [1] - **Model Objectives** - DeepSeek-V3.2 is designed for daily applications, focusing on practical scenarios like Q&A and general agent tasks [1] - DeepSeek-V3.2-Speciale is focused on maximizing the reasoning capabilities of the model, aiming to explore its limits [1]

Seek .(US:SKLTY)

Artificial Intelligence

DeepSeek-V3.2-Speciale

Artificial Intelligence

DeepSeek-V3.2-Speciale

美媒：越来越多硅谷企业正依托中国开源AI模型进行开发，“中国人是AI领域真正创新者”

Huan Qiu Wang· 2025-12-01 02:47

报道称，理论物理学家和机器学习工程师拉斯金，曾参与创建美国谷歌公司部分最强大的AI模型。他发现，美国AI企业正更多采纳免费、可定制且功能日益强大的开源AI模型，其大多产自中国，并正迅速赶超美国竞争对手。拉斯金说，这些模型与前沿技术的距离之近"令人惊讶"，而如今正在涌现的产品已显然"非常接近"前沿技术。【环球网报道记者李梓瑜】据美国全国广播公司（NBC）11月30日报道，越来越多美国硅谷企业正依托中国开源人工智能（AI）模型进行开发。美国艾伦人工智能研究所机器学习研究员纳坦·兰伯特坦言，"中国人是AI领域真正的创新者"。 NBC称，这种日益增长的接受度可能给美国AI产业带来问题。投资者已向美国开放人工智能研究中心（OpenAI）和Anthropic公司投入数百亿美元，押注美国领先的AI企业将主导全球AI市场，但美国企业越来越多使用中国免费模型，引发人们对"美国追求闭源模型的做法是否完全错误"的质疑。报道补充说，除了性能提升、隐私性增强和成本降低外，开源模型还凭借其生态系统优势不断增加影响力，而许多中国企业推出新产品的速度也比美国同行更快。纳坦·兰伯特表示，中国模型近期取得的进步并非偶然 ...

Artificial Intelligence

中国开源AI模型

Artificial Intelligence

中国开源AI模型

“力量平衡变了，中国AI愈发成为硅谷技术基石”

Guan Cha Zhe Wang· 2025-12-01 00:19

Core Viewpoint - The article discusses the increasing adoption of Chinese open-source AI models by Silicon Valley startups, highlighting their competitive advantages over traditional closed-source models from American companies like OpenAI and Anthropic. This shift raises questions about the sustainability of the closed-source model approach in the U.S. AI industry [1][4][10]. Group 1: Adoption of Chinese AI Models - Many U.S. AI startups are increasingly utilizing Chinese open-source AI models due to their lower costs, higher customization, and strong privacy protection, with some models performing comparably to leading American models [1][4][6]. - Reflection AI, a startup founded by Misha Laskin, aims to provide American alternatives to these high-performance Chinese models, reflecting a growing trend in the industry [2][4]. - The acceptance of Chinese models is seen as a potential challenge to the U.S. AI industry, as investors have heavily backed American companies, raising doubts about the actual advantages of U.S. models [4][10]. Group 2: Performance and Cost Efficiency - Chinese models like DeepSeek and Alibaba's Tongyi Qianwen have made significant technological advancements, closing the performance gap with American closed-source models [5][9]. - Companies like Exa have reported that running Chinese models on their own hardware can be faster and cheaper than using models from OpenAI or Google [4][5]. - The cost-effectiveness of open-source models is crucial for startups, with some users preferring local processing for privacy reasons, further driving the adoption of Chinese models [6][7]. Group 3: Ecosystem and Community Support - The growing ecosystem around Chinese open-source models is attracting more developers, as these models are often accompanied by extensive training resources and community support [7][8]. - Platforms like Kilo Code show a preference for Chinese models among developers, indicating a shift in the default starting point for model customization [8][9]. - The rapid release cycle of Chinese models, with Alibaba launching new models approximately every 20 days, contrasts with the slower pace of American companies, highlighting a competitive edge [9][10]. Group 4: U.S. Response and Future Outlook - The U.S. government has recognized the need to encourage the development of open-source AI models, as evidenced by the release of the AI Action Plan and new open-source initiatives from companies like OpenAI and the Allen Institute [12][13]. - The ATOM initiative aims to reclaim the U.S. leadership position in open-source models, emphasizing the importance of maintaining a competitive edge in the AI landscape [13].

Artificial Intelligence

中国开源AI模型

OpenAI闭源旗舰模型

Artificial Intelligence

中国开源AI模型

OpenAI闭源旗舰模型