DeepSeek
Search documents
从开源最强到挑战全球最强:DeepSeek新模型给出了解法
Guan Cha Zhe Wang· 2025-12-02 11:38
Core Insights - DeepSeek has released two official models: DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, with the former focusing on balancing reasoning ability and output length for everyday use, while the latter enhances long-form reasoning and mathematical proof capabilities [1][2][4] - The open-source large model ecosystem has seen significant growth, with DeepSeek's advancements posing a challenge to closed-source models, particularly in light of the recent release of Google Gemini 3.0, which has raised the competitive bar [2][15] - DeepSeek's models are positioned to bridge the gap between open-source and closed-source models through innovative architecture and training strategies, despite limitations in computational resources compared to industry giants [8][15][16] Model Performance - DeepSeek-V3.2 has achieved performance levels comparable to GPT-5 and is slightly below Google’s Gemini 3 Pro, demonstrating its effectiveness in reasoning tasks [6][7] - The Speciale version has outperformed Gemini 3 Pro in several reasoning benchmarks, including the American Mathematics Invitational Exam (AIME) and the Harvard-MIT Mathematics Tournament (HMMT) [7][8] - Speciale's design focuses on rigorous mathematical proof and logical verification, making it a specialized tool for complex reasoning tasks [6][8] Technological Innovations - DeepSeek employs a novel DSA (DeepSeek Sparse Attention) mechanism to optimize computational efficiency, allowing for effective long-context processing without sacrificing performance [8][12] - The concept of "Interleaved Thinking" has been integrated into DeepSeek's models, enhancing the interaction between reasoning and tool usage, which is crucial for AI agents [9][12] - The focus on agent capabilities signifies a strategic shift towards creating actionable AI, moving beyond traditional chat-based interactions to more complex task execution [13][14] Industry Context - The competitive landscape is shifting, with DeepSeek acknowledging the widening gap between open-source and closed-source models, particularly in complex task performance [15][16] - DeepSeek aims to address its limitations by increasing pre-training computational resources and optimizing model efficiency, indicating a clear path for future improvements [16][19] - The release of DeepSeek-V3.2 has been seen as a significant achievement in the open-source community, suggesting that the gap with leading closed-source models is narrowing [16][19]
中科曙光:曙光AI超集群系统等产品深度适配DeepSeek-V3.2
Zheng Quan Shi Bao Wang· 2025-12-02 10:28
Core Viewpoint - DeepSeek has officially released versions V3.2 and V3.2-Speciale, significantly enhancing its Agent capabilities and integrating reasoning and thinking [1] Group 1: Product Development - The new versions of DeepSeek are based on China's first AI computing open architecture, achieving "cross-layer collaboration" across hardware, software, and model layers [1] - The products, including the Shuguang AI supercluster system and scaleX640 super nodes, have completed deep adaptation and tuning for the new DeepSeek versions [1] Group 2: Market Application - The enhancements in DeepSeek support full-scale deployment for clients across various industries [1]
DeepSeek重磅上新,对标美国行业巨头,“所有群聊都炸锅了!”
Xin Lang Cai Jing· 2025-12-02 10:24
Core Insights - DeepSeek, a Chinese AI startup, launched two new models, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, achieving performance levels comparable to leading models from OpenAI and Google DeepMind [1][4][7] - The release coincides with the NeurIPS conference, generating significant interest in the AI research community [2][7] - The V3.2 model is designed for practical use, while the V3.2-Speciale focuses on enhanced reasoning capabilities, achieving gold medal-level performance in prestigious competitions [5][6][7] Model Performance - DeepSeek-V3.2 matches OpenAI's GPT-5 in mainstream reasoning benchmarks and is slightly below Google’s Gemini-3.0 Pro [4][6] - The V3.2-Speciale version excels in reasoning tests, achieving scores that rival Gemini-3.0 Pro [4][5] - Both models have shown significant improvements in efficiency, reducing computational costs and user wait times [4][6] Competitive Landscape - The success of DeepSeek's models indicates that Chinese open-source AI systems are becoming competitive with top proprietary models from Silicon Valley [7][8] - The trend towards open-source AI in China contrasts with the closed strategies of major US tech companies, which prefer to maintain control over their advanced technologies [9][10] - Recent data shows that the download share of open-source AI models from Chinese teams has surpassed that of US teams for the first time [8][9] Industry Implications - The advancements from DeepSeek suggest a shift in the AI model release paradigm, with Chinese companies frequently launching new models and versions [9][10] - The focus on open-source models in China may lead to broader applications of AI technology, potentially challenging the dominance of US AI labs [10]
ChatGPT三周年遭DeepSeek暴击,23页技术报告藏着开源登顶的全部秘密
36氪· 2025-12-02 09:19
Core Insights - DeepSeek has launched two new models, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, which significantly enhance reasoning capabilities, rivaling GPT-5 and Gemini-3.0-Pro, while addressing long-standing issues in open-source models [2][5][48]. Model Features - DeepSeek-V3.2 focuses on cost-effectiveness and daily use, achieving reasoning capabilities comparable to GPT-5, with faster and shorter outputs than Kimi-K2-Thinking, and introduces "thinking while using tools" [5][19]. - DeepSeek-V3.2-Speciale targets the upper limits of AI capabilities, performing exceptionally in competitions like IMO and ICPC, but is resource-intensive and does not support tool calls [5][19][38]. Technical Innovations - The introduction of DSA (Sparse Attention Mechanism) allows the model to focus on important parts of the input, significantly improving processing speed and efficiency, supporting a context length of 128K [9][12][13]. - DeepSeek invested over 10% of the pre-training budget in post-training resources, utilizing a stable and scalable reinforcement learning framework to enhance model performance [14][15]. Training Methodology - The training process involves "expert distillation" to create specialized models in various fields, followed by "mixed reinforcement learning training" to unify different task performances and prevent catastrophic forgetting [16][18]. - The model's performance is enhanced through a self-training pipeline, where AI generates and verifies its own training data across over 18,000 tasks, promoting self-evolution [30][32]. Performance Metrics - In benchmark tests, DeepSeek-V3.2 shows competitive performance with GPT-5 and Kimi-K2-Thinking across various metrics, while the Speciale version approaches or exceeds Gemini-3.0-Pro [33][34]. - The model achieved notable results in competitions, including gold medals in IMO 2025 and CMO 2025, demonstrating its advanced reasoning and problem-solving capabilities [38][39]. Future Directions - Despite its advancements, DeepSeek acknowledges that V3.2 still has room for improvement in training resource allocation and token efficiency compared to top closed-source models [42][43]. - The company aims to enhance the underlying model and post-training methods in future versions, indicating potential developments for V4 [43].
再谈注意力:阿里、Kimi 都在用的 DeltaNet 和线性注意力新改进丨晚点播客
晚点LatePost· 2025-12-02 09:13
Core Insights - The article discusses advancements in linear attention mechanisms, particularly DeltaNet, which aims to improve the efficiency and effectiveness of large language models (LLMs) by reducing the computational complexity associated with traditional attention mechanisms [5][10][12]. Group 1: Linear Attention Mechanisms - Linear attention mechanisms, such as DeltaNet, were introduced to address the computational bottleneck of traditional attention mechanisms, which exhibit quadratic complexity with respect to input length [5][12]. - DeltaNet's development has been a collaborative effort, with significant contributions from researchers since its inception in 2021, focusing on improving the update rules and parallelization of linear attention [7][20][21]. - The recent open-source releases of Qwen3-Next and Kimi Linear models by Alibaba and Kimi, respectively, incorporate linear attention mechanisms, indicating a shift towards these more efficient models in flagship applications [5][24]. Group 2: DeltaNet and Its Evolution - DeltaNet was initially overlooked due to a lack of key architectural improvements and suboptimal implementations, but recent advancements have led to its increased adoption in industry [20][24]. - The introduction of the Gated DeltaNet variant enhances memory control and retrieval performance, making it more suitable for modern hardware [7][21][24]. - The relationship between DeltaNet and other models, such as Kimi Linear, highlights the trend of integrating linear attention with traditional full attention mechanisms to balance speed and capacity [24][25]. Group 3: Future Directions and Challenges - The article emphasizes the need for further exploration of update rules in linear attention mechanisms, suggesting that improvements in this area could lead to better performance and scalability [48][49]. - There is a discussion on the potential of combining sparse attention with linear attention to address long-text processing challenges, which remains a significant hurdle in current models [46][49]. - The ongoing debate in the industry regarding the effectiveness of linear versus full attention mechanisms reflects the complexities and trade-offs involved in model design for various applications [27][30].
对标美国行业巨头,“所有群聊都炸锅了”
Guan Cha Zhe Wang· 2025-12-02 08:46
Core Insights - DeepSeek, a Chinese AI startup, has launched two new models, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, which have achieved performance levels comparable to leading models from OpenAI and Google DeepMind [1][8] - The release of these models coincides with the upcoming NeurIPS conference, generating significant interest in the AI research community [2][8] Model Performance - DeepSeek-V3.2 is designed for practical use, achieving performance on par with OpenAI's GPT-5 in mainstream reasoning benchmarks, while DeepSeek-V3.2-Speciale excels in reasoning capabilities, matching Google DeepMind's Gemini 3.0 Pro [1][4] - The V3.2 model has shown a significant reduction in output length compared to Kimi-K2-Thinking, leading to lower computational costs and reduced user wait times [4] - DeepSeek-V3.2-Speciale has demonstrated exceptional performance in international competitions, including winning gold medals in IMO 2025 and IOI 2025, marking a significant achievement for open-source AI models [5][8] Competitive Landscape - The advancements made by DeepSeek indicate that Chinese open-source AI systems are becoming competitive with top proprietary models from Silicon Valley [8][10] - The trend towards open-source models in China contrasts with the closed strategies of major US tech companies, which tend to keep their advanced AI technologies proprietary [10][11] - Recent data shows that the download share of open-source AI models developed by Chinese teams has surpassed that of US teams for the first time, indicating a shift in the global AI landscape [9][10] Community and Industry Impact - The announcement of DeepSeek's new models has sparked excitement within the AI research community, with discussions and engagement across various platforms [2][8] - The models are now available on DeepSeek's official website, app, and API, with the Speciale version currently offered as a temporary API for community evaluation [5][7]
博时市场点评12月2日:两市震荡调整,成交有所缩量
Xin Lang Cai Jing· 2025-12-02 08:23
简评:商业不动产REITs试点推出意义重大,将为房企和地方国资提供市场化融资与退出渠道,有效缓 解流动性压力。采取与基础设施REITs并行推进策略,能精准对接商业不动产盘活需求。审核链条简化 有望加速产品扩容,中长期看,有利于盘活万亿级存量资产,降低杠杆,防范风险,为房地产发展新模 式提供金融支持,促进资本市场服务实体经济质效提升。 今年以来,截至12月1日,共有3004只科技创新债券正式发行,发行规模合计达3.18万亿元,发行数量 及总规模相较去年同期分别增长85%和98%,为科技创新企业提供了有力的资金支持。 简评:今年科创债发行明显提速,发行主体及发行规模扩容显著。发行科创债有助于帮助企业融资,为 科创企业提供中长期资金,缓解融资难问题。同时,可以增加债券市场品种,满足多元投资需求,助力 资本市场创新。引导资金流向科技创新领域,提高政策传导效率。 【博时市场点评12月2日】两市震荡调整,成交有所缩量 每日观点 今日沪深三大指数震荡调整,两市成交缩量至1.6万亿。昨日美国供应管理协会(ISM)数据显示,11月 美国制造业PMI从10月的48.7降至48.2,连续第九个月低于50的荣枯线,并创下四个月来的最 ...
中国AI大战将在2026年“全面加剧”:“流量入口”成大厂“必争之地”,AI出海也将加速
Hua Er Jie Jian Wen· 2025-12-02 06:42
Core Insights - In 2025, China's internet sector is expected to lead globally with a remarkable return rate of 36.5%, but the real story will unfold in 2026 around artificial intelligence (AI) [1] - The competition in China's AI sector will focus on three main themes: AI cloud infrastructure, AI chatbots, and AI applications [3] Group 1: AI Cloud Infrastructure - Major players like Alibaba and Baidu are heavily investing in AI cloud infrastructure, with Alibaba's capital expenditure reaching approximately 120 billion RMB in the past four quarters and plans to invest 380 billion RMB over the next three years [3] - Alibaba's cloud business revenue grew by 34% year-on-year in Q3 2025, while Baidu's AI cloud revenue also saw a 21% year-on-year increase, reaching 6.2 billion RMB [3] Group 2: AI Chatbot Competition - AI chatbots are identified as the "traffic entry point" for the AI era, with Alibaba, ByteDance, and Tencent competing fiercely for user engagement [7] - ByteDance's chatbot "Doubao" leads the Chinese market with 197 million monthly active users (MAU) as of October 2025 [7] Group 3: Vertical AI Applications - Companies in vertical sectors like Meituan, Ctrip, and Didi are developing proprietary AI agents using their unique data to enhance user engagement and explore new monetization opportunities [11] - Ctrip's AI travel assistant "TripGenie" saw its user base grow by over 200% year-on-year in the first half of 2025 [11] Group 4: Global Expansion of AI Applications - Chinese AI applications are accelerating their global reach, with ByteDance's products ranking among the top in the world; "Dola" and another Chinese product "DeepSeek" have 47 million and 39 million MAU, respectively [14] - Combining the overseas users of Dola with Doubao's domestic users, ByteDance's AI chat products could reach approximately 250 million MAU, placing them third globally [14] Group 5: Performance Review and Future Outlook - In Q3 2025, 27 out of 44 internet companies reported better-than-expected profits, attributed to ongoing cost optimization and productivity gains from AI [18] - The report anticipates that the proliferation of AI tools will enhance consumer efficiency, leading to increased leisure and entertainment spending, particularly benefiting the gaming and tourism sectors [18] - The gaming industry is expected to thrive due to improved development efficiency, with the average revenue per user (ARPU) rebounding to 41 RMB, a 13.3% year-on-year increase [18] - The tourism sector shows resilience, with tourism expenditure as a percentage of GDP at 4.3% in 2024, indicating growth potential as international flight passenger volumes return to pre-pandemic levels [18]
国产算力生态加速成长,科创板50ETF(588080)等产品助力把握科技创新机遇
Mei Ri Jing Ji Xin Wen· 2025-12-02 05:33
Core Viewpoint - The technology sector, particularly in China, is experiencing a decline in key indices, while DeepSeek has launched two new models aimed at enhancing computational capabilities and innovation in the computing ecosystem [1] Group 1: Market Performance - The STAR Market 50 Index and STAR Comprehensive Index both fell by 1.2%, while the STAR Growth Index decreased by 1.3%, and the STAR 100 Index dropped by 1.4% [1] Group 2: DeepSeek Model Launch - DeepSeek has released two official model versions: DeepSeek-V3.2 and DeepSeek-V3.2-Speciale [1] - DeepSeek-V3.2 aims to balance reasoning ability with output length, making it suitable for everyday applications such as Q&A scenarios and general agent tasks [1] - DeepSeek-V3.2-Speciale is designed to push the reasoning capabilities of open-source models to their limits, exploring the boundaries of model capabilities [1] Group 3: Industry Impact - Analysts suggest that DeepSeek is driving collaborative innovation and evolution within China's computing ecosystem, integrating model and algorithm innovations with compiler languages and lower-level computing chips [1] - This initiative is expected to promote the growth of China's computing ecosystem [1]
多空双方围绕60日均线争夺
Chang Sha Wan Bao· 2025-12-02 05:31
Market Overview - The three major indices opened lower, with the Shanghai Composite Index down 0.14%, the Shenzhen Component down 0.13%, and the ChiNext down 0.04% [1] - Trading volume in the Shanghai and Shenzhen markets exceeded 560 billion, a decrease of over 60 billion compared to the previous day, with an expected total trading amount of over 1.7 trillion for the day [1] Industry Insights - The AI mobile phone concept is actively fluctuating, while the commercial aerospace concept is recovering, and the real estate sector is strengthening [1] - According to a report by CRIC Real Estate Research, the supply of new properties in 30 key cities in November is expected to reach 6.69 million square meters, a month-on-month increase of 16% [1] - In the first batch of sci-tech innovation and entrepreneurship robot ETFs, more funds are expected to flow into the robotics sector, boosting market performance [2] - The CEO of Google DeepMind expressed intentions to make Gemini the software foundation for the robotics world, indicating further investment in embodied robotics [2] Product Development - ZTE announced the limited release of the Nubia M153, which features the Doubao mobile assistant technology, aimed at developers and interested users [3] - The Doubao mobile assistant, led by ByteDance, enhances user interaction and experience through AI capabilities integrated into mobile devices [3] - The development of AI on mobile devices is expected to drive hardware upgrades, potentially leading to a new wave of device replacements [3] Technical Analysis - The market experienced a rebound, with the Shanghai Composite Index returning above 3900 points and trading volume reaching nearly 1.87 trillion [4] - The current rebound is approaching a previous gap and a key resistance level, suggesting a likely consolidation phase [4] - The AI sector is experiencing significant growth, indicating a potential recovery in technology stocks, while sectors like innovative pharmaceuticals and non-ferrous metals remain worth monitoring [4]