DeepSeek
Search documents
好家伙!DeepSeek 一口气连发 2 个新模型
程序员的那些事· 2025-12-02 13:49
转自:量子位 | 公众号 QbitAI 突袭! ChatGPT发布三周年,DeepSeek嚯一下发出两个模型: 前者聚焦平衡实用 ,适用于日常问答、通用Agent任务、真实应用场景下的工具调用。 推理达GPT-5水平,略低于Gemini-3.0-Pro。 后者主打极致推理, 推理基准性能媲美Gemini-3.0-Pro。 还一把斩获IMO 2025、CMO 2025、ICPC World Finals 2025、IOI 2025金牌。 划重点,ICPC达到人类选手第二、IOI人类选手第十名水平。 具体来说,DeepSeek-V3.2侧重于平衡推理能力与输出长度,降低计算开销。 DeepSeek官微推文中写道,"DeepSeek-V3.2模型在Agent评测中达到了当前开源模型的最高水平"。 该模型其他情况如下: 下图展示的是DeepSeek-V3.2与其他模型在各类Agent工具调用评测集上的得分 DeepSeek-V3.2 DeepSeek-V3.2-Speciale 推理能力比肩GPT-5; 相比Kimi-K2-Thinking大幅缩短输出长度,减少用户等待时间; DeepSeek旗下首个"思考融入工具调 ...
Sam Altman Declares Code Red
Seeking Alpha· 2025-12-02 11:57
Listen on the go! A daily podcast of Wall Street Breakfast will be available by 8:00 a.m. on Seeking Alpha, iTunes, Spotify.Getty Images Good morning! Here is the latest in trending:Sweetened offer: Warner Bros. Discovery (WBD) received a mostly cash offer from Netflix (NFLX), which is arranging a bridge loan worth tens of billions of dollars for its bid.Tariff refund: Costco (COST) sued the U.S. government to ensure it gets a full refund of tariffs if the Supreme Court rules against President Trump's levie ...
从开源最强到挑战全球最强:DeepSeek新模型给出了解法
Guan Cha Zhe Wang· 2025-12-02 11:38
Core Insights - DeepSeek has released two official models: DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, with the former focusing on balancing reasoning ability and output length for everyday use, while the latter enhances long-form reasoning and mathematical proof capabilities [1][2][4] - The open-source large model ecosystem has seen significant growth, with DeepSeek's advancements posing a challenge to closed-source models, particularly in light of the recent release of Google Gemini 3.0, which has raised the competitive bar [2][15] - DeepSeek's models are positioned to bridge the gap between open-source and closed-source models through innovative architecture and training strategies, despite limitations in computational resources compared to industry giants [8][15][16] Model Performance - DeepSeek-V3.2 has achieved performance levels comparable to GPT-5 and is slightly below Google’s Gemini 3 Pro, demonstrating its effectiveness in reasoning tasks [6][7] - The Speciale version has outperformed Gemini 3 Pro in several reasoning benchmarks, including the American Mathematics Invitational Exam (AIME) and the Harvard-MIT Mathematics Tournament (HMMT) [7][8] - Speciale's design focuses on rigorous mathematical proof and logical verification, making it a specialized tool for complex reasoning tasks [6][8] Technological Innovations - DeepSeek employs a novel DSA (DeepSeek Sparse Attention) mechanism to optimize computational efficiency, allowing for effective long-context processing without sacrificing performance [8][12] - The concept of "Interleaved Thinking" has been integrated into DeepSeek's models, enhancing the interaction between reasoning and tool usage, which is crucial for AI agents [9][12] - The focus on agent capabilities signifies a strategic shift towards creating actionable AI, moving beyond traditional chat-based interactions to more complex task execution [13][14] Industry Context - The competitive landscape is shifting, with DeepSeek acknowledging the widening gap between open-source and closed-source models, particularly in complex task performance [15][16] - DeepSeek aims to address its limitations by increasing pre-training computational resources and optimizing model efficiency, indicating a clear path for future improvements [16][19] - The release of DeepSeek-V3.2 has been seen as a significant achievement in the open-source community, suggesting that the gap with leading closed-source models is narrowing [16][19]
中科曙光:曙光AI超集群系统等产品深度适配DeepSeek-V3.2
Zheng Quan Shi Bao Wang· 2025-12-02 10:28
Core Viewpoint - DeepSeek has officially released versions V3.2 and V3.2-Speciale, significantly enhancing its Agent capabilities and integrating reasoning and thinking [1] Group 1: Product Development - The new versions of DeepSeek are based on China's first AI computing open architecture, achieving "cross-layer collaboration" across hardware, software, and model layers [1] - The products, including the Shuguang AI supercluster system and scaleX640 super nodes, have completed deep adaptation and tuning for the new DeepSeek versions [1] Group 2: Market Application - The enhancements in DeepSeek support full-scale deployment for clients across various industries [1]
DeepSeek重磅上新,对标美国行业巨头,“所有群聊都炸锅了!”
Xin Lang Cai Jing· 2025-12-02 10:24
Core Insights - DeepSeek, a Chinese AI startup, launched two new models, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, achieving performance levels comparable to leading models from OpenAI and Google DeepMind [1][4][7] - The release coincides with the NeurIPS conference, generating significant interest in the AI research community [2][7] - The V3.2 model is designed for practical use, while the V3.2-Speciale focuses on enhanced reasoning capabilities, achieving gold medal-level performance in prestigious competitions [5][6][7] Model Performance - DeepSeek-V3.2 matches OpenAI's GPT-5 in mainstream reasoning benchmarks and is slightly below Google’s Gemini-3.0 Pro [4][6] - The V3.2-Speciale version excels in reasoning tests, achieving scores that rival Gemini-3.0 Pro [4][5] - Both models have shown significant improvements in efficiency, reducing computational costs and user wait times [4][6] Competitive Landscape - The success of DeepSeek's models indicates that Chinese open-source AI systems are becoming competitive with top proprietary models from Silicon Valley [7][8] - The trend towards open-source AI in China contrasts with the closed strategies of major US tech companies, which prefer to maintain control over their advanced technologies [9][10] - Recent data shows that the download share of open-source AI models from Chinese teams has surpassed that of US teams for the first time [8][9] Industry Implications - The advancements from DeepSeek suggest a shift in the AI model release paradigm, with Chinese companies frequently launching new models and versions [9][10] - The focus on open-source models in China may lead to broader applications of AI technology, potentially challenging the dominance of US AI labs [10]
ChatGPT三周年遭DeepSeek暴击,23页技术报告藏着开源登顶的全部秘密
36氪· 2025-12-02 09:19
DeepSeek V3.2上新黑科技。 来源| APPSO(ID:appsolution) 封面来源 | unsplash ChatGPT诞生三周年之际,DeepSeek送上「庆生礼物」。 12月1日, DeepSeek一口气发布两款模型:DeepSeek-V3.2和DeepSeek-V3.2-Speciale。这两个模型不仅在推理能力上直逼GPT-5和Gemini-3.0-Pro ,更重 要的是,它们解决了一个困扰开源模型很久的问题: 过去几个月,AI圈出现了一个明显的趋势:闭源模型越跑越快,开源模型却有点跟不上节奏了。DeepSeek团队分析后发现,开源模型在处理复杂任务时有 三个核心瓶颈:架构问题、资源分配以及智能体能力。 针对这三个问题,DeepSeek这次拿出了三个大招。 如果你用过一些AI模型处理超长文档,可能会发现速度越来越慢,甚至直接卡死。这就是传统注意力机制的锅。 怎么让AI既会深度思考,又会熟练使用工具? 新模型省流版如下: DeepSeek-V3.2(标准版) :主打性价比与日常使用,推理能力达到GPT-5水平,比Kimi-K2-Thinking输出更短、更快且更省成本,并首次实现「边思 ...
再谈注意力:阿里、Kimi 都在用的 DeltaNet 和线性注意力新改进丨晚点播客
晚点LatePost· 2025-12-02 09:13
Core Insights - The article discusses advancements in linear attention mechanisms, particularly DeltaNet, which aims to improve the efficiency and effectiveness of large language models (LLMs) by reducing the computational complexity associated with traditional attention mechanisms [5][10][12]. Group 1: Linear Attention Mechanisms - Linear attention mechanisms, such as DeltaNet, were introduced to address the computational bottleneck of traditional attention mechanisms, which exhibit quadratic complexity with respect to input length [5][12]. - DeltaNet's development has been a collaborative effort, with significant contributions from researchers since its inception in 2021, focusing on improving the update rules and parallelization of linear attention [7][20][21]. - The recent open-source releases of Qwen3-Next and Kimi Linear models by Alibaba and Kimi, respectively, incorporate linear attention mechanisms, indicating a shift towards these more efficient models in flagship applications [5][24]. Group 2: DeltaNet and Its Evolution - DeltaNet was initially overlooked due to a lack of key architectural improvements and suboptimal implementations, but recent advancements have led to its increased adoption in industry [20][24]. - The introduction of the Gated DeltaNet variant enhances memory control and retrieval performance, making it more suitable for modern hardware [7][21][24]. - The relationship between DeltaNet and other models, such as Kimi Linear, highlights the trend of integrating linear attention with traditional full attention mechanisms to balance speed and capacity [24][25]. Group 3: Future Directions and Challenges - The article emphasizes the need for further exploration of update rules in linear attention mechanisms, suggesting that improvements in this area could lead to better performance and scalability [48][49]. - There is a discussion on the potential of combining sparse attention with linear attention to address long-text processing challenges, which remains a significant hurdle in current models [46][49]. - The ongoing debate in the industry regarding the effectiveness of linear versus full attention mechanisms reflects the complexities and trade-offs involved in model design for various applications [27][30].
对标美国行业巨头,“所有群聊都炸锅了”
Guan Cha Zhe Wang· 2025-12-02 08:46
Core Insights - DeepSeek, a Chinese AI startup, has launched two new models, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, which have achieved performance levels comparable to leading models from OpenAI and Google DeepMind [1][8] - The release of these models coincides with the upcoming NeurIPS conference, generating significant interest in the AI research community [2][8] Model Performance - DeepSeek-V3.2 is designed for practical use, achieving performance on par with OpenAI's GPT-5 in mainstream reasoning benchmarks, while DeepSeek-V3.2-Speciale excels in reasoning capabilities, matching Google DeepMind's Gemini 3.0 Pro [1][4] - The V3.2 model has shown a significant reduction in output length compared to Kimi-K2-Thinking, leading to lower computational costs and reduced user wait times [4] - DeepSeek-V3.2-Speciale has demonstrated exceptional performance in international competitions, including winning gold medals in IMO 2025 and IOI 2025, marking a significant achievement for open-source AI models [5][8] Competitive Landscape - The advancements made by DeepSeek indicate that Chinese open-source AI systems are becoming competitive with top proprietary models from Silicon Valley [8][10] - The trend towards open-source models in China contrasts with the closed strategies of major US tech companies, which tend to keep their advanced AI technologies proprietary [10][11] - Recent data shows that the download share of open-source AI models developed by Chinese teams has surpassed that of US teams for the first time, indicating a shift in the global AI landscape [9][10] Community and Industry Impact - The announcement of DeepSeek's new models has sparked excitement within the AI research community, with discussions and engagement across various platforms [2][8] - The models are now available on DeepSeek's official website, app, and API, with the Speciale version currently offered as a temporary API for community evaluation [5][7]
博时市场点评12月2日:两市震荡调整,成交有所缩量
Xin Lang Cai Jing· 2025-12-02 08:23
简评:商业不动产REITs试点推出意义重大,将为房企和地方国资提供市场化融资与退出渠道,有效缓 解流动性压力。采取与基础设施REITs并行推进策略,能精准对接商业不动产盘活需求。审核链条简化 有望加速产品扩容,中长期看,有利于盘活万亿级存量资产,降低杠杆,防范风险,为房地产发展新模 式提供金融支持,促进资本市场服务实体经济质效提升。 今年以来,截至12月1日,共有3004只科技创新债券正式发行,发行规模合计达3.18万亿元,发行数量 及总规模相较去年同期分别增长85%和98%,为科技创新企业提供了有力的资金支持。 简评:今年科创债发行明显提速,发行主体及发行规模扩容显著。发行科创债有助于帮助企业融资,为 科创企业提供中长期资金,缓解融资难问题。同时,可以增加债券市场品种,满足多元投资需求,助力 资本市场创新。引导资金流向科技创新领域,提高政策传导效率。 【博时市场点评12月2日】两市震荡调整,成交有所缩量 每日观点 今日沪深三大指数震荡调整,两市成交缩量至1.6万亿。昨日美国供应管理协会(ISM)数据显示,11月 美国制造业PMI从10月的48.7降至48.2,连续第九个月低于50的荣枯线,并创下四个月来的最 ...
中国AI大战将在2026年“全面加剧”:“流量入口”成大厂“必争之地”,AI出海也将加速
Hua Er Jie Jian Wen· 2025-12-02 06:42
2025年,中国互联网板块以36.5%的惊人回报率领跑全球,但真正的故事将在2026年围绕人工智能全面展开。 据追风交易台消息,花旗在12月1日发布的《中国互联网2026年上半年展望》报告中表示,2026年中国AI领域的竞争将围绕三大主题展开:AI云基础设 施、AI聊天机器人以及AI应用。报告认为,阿里巴巴、字节跳动和腾讯等主要参与者正争相通过旗下的AI聊天机器人抢占用户流量,以期在AI时代锁 定未来生态系统商业化的关键入口。 分析师Alicia Yap等人强调,这场"用户流量地盘争夺战"将是中国互联网巨头2026年的关键阿里巴巴和腾讯被视为核心AI投资标的,而字节跳动凭借其 AI应用在全球市场的迅猛扩张,已成为不可忽视的颠覆性力量,其AI聊天应用全球月活用户(MAU)总计已冲至全球第三。 与此同时,AI带来的生产力提升将释放更多休闲娱乐需求,利好游戏和旅游等具有稳定盈利能力的行业。花旗认为,由于地缘政治风险和AI供应链限 制,中国互联网公司的估值可能将继续低于全球同行。 2026年AI竞赛:三大主题全面展开 报告预测,2026年中国AI领域的竞争将非常激烈,主要聚焦于三个层面。这不仅是技术之争,更是未来商业 ...