Deepseek V3

Search documents
梁文锋等来及时雨
虎嗅APP· 2025-07-16 00:05
Core Viewpoint - The article discusses the competitive landscape of AI models, particularly focusing on DeepSeek and its challenges in maintaining user engagement and market position against emerging competitors like Kimi and others in the "AI Six Dragons" group. Group 1: DeepSeek's Performance and Challenges - DeepSeek experienced a significant decline in monthly active users, dropping from a peak of 169 million in January to a decrease of 5.1% by May [1][2]. - The download ranking of DeepSeek has plummeted, moving from the top of the App Store charts to outside the top 30 [2]. - The user engagement rate for DeepSeek has fallen from 7.5% at the beginning of the year to 3% by the end of May, with a 29% decrease in website traffic [2][3]. Group 2: Competition and Market Dynamics - Competitors like Kimi and others are rapidly releasing new models, with Kimi K2 achieving significant performance benchmarks and offering competitive pricing [1][8]. - The pricing strategy of Kimi K2 aligns closely with DeepSeek's API pricing, making it a direct competitor in terms of cost [8]. - Other players in the market are also emphasizing lower costs and better performance, which is eroding DeepSeek's previously established reputation for cost-effectiveness [7][8]. Group 3: Technological and Strategic Implications - DeepSeek's reliance on the H20 chip has been impacted by export restrictions, which has hindered its ability to scale and innovate [3][4]. - The lack of major updates to DeepSeek's models has led to a perception of stagnation, while competitors are rapidly iterating and improving their offerings [6][12]. - The article highlights the importance of multi-modal capabilities, which DeepSeek currently lacks, potentially limiting its appeal in a market that increasingly values such features [13]. Group 4: Future Outlook - To regain market interest, DeepSeek needs to expedite the release of new models like V4 and R2, as well as enhance its tool capabilities to meet developer needs [12][13]. - The competitive landscape is shifting rapidly, and without significant updates or innovations, DeepSeek risks losing further ground to its rivals [12][14]. - The article suggests that maintaining developer engagement and user interest is crucial for DeepSeek's long-term success in the evolving AI market [11].
K2开源大模型,会是Kimi的DeepSeek时刻吗?
Hu Xiu· 2025-07-14 03:20
尽管理论上Meta的Llama4-Behemoth更大,为2万亿,但它是"期货",也许不会再发布了;而DeepSeek的V3则是6710亿参数。OpenAI原本也有一款开源模 型计划发布,但恰好在K2发布后,奥特曼再次推迟了。也许将两者联系起来是牵强的,但是,K2再次证明中国正在成为全球技术秩序重组中主导开放创 新的关键力量,却是不争的事实。 K2的表现相当出色,尤其是在智能体相关任务领域。它在基准测试SWE Bench Verified(编程)、Tau2(智能体)中仅次于Claude 4 Opus,在AceBench (工具调用)中仅次于GPT-4.1。它也相当便宜,官方API服务与DeepSeek的R1相当,但最大支持上下文长度(128K)要高于R1(64K)。HuggingFace联 合创始人Thomas Wolf认为,K2足以证明开源模型仍在继续挑战最新的闭源权重模型。 周末,硅谷的开源社区、云厂商与AI开发者都在讨论MoonShot最新开源大模型K2。它的总参数规模达到了万亿级别(1T),是目前开源模型中最大的, 激活参数320亿。 这一轮的AI"六小虎",仍有四家在场上,都渴望一场DeepSeek ...
AI 编程冲击来袭,程序员怎么办?IDEA研究院张磊:底层系统能力才是护城河
AI前线· 2025-07-13 04:12
采访 | 霍太稳 整理 | 宇琪 编辑 | Tina、蔡芳芳 在人工智能迈向"多模态智能体"新时代的过程中,视觉理解的超高维度、空间智能的建模难题, 以及将感知、认知与行动高效整合的挑战,仍如横亘在前的巨大鸿沟。如何让智能体真正实现"看 懂、想透、做好"?当前最具可行性的应用突破口是什么? 在 6 月 27-28 日于北京举办的 AICon 全球人工智能开发与应用大会上,InfoQ 现场特别专访了 IDEA 研究院计算机视觉与机器人研究中心讲席科学家张磊。他在采访中剖析了从"半结构化"场景 切入的务实落地路径,分享了在工业界如何平衡前沿探索与产品落地的独到见解,并对年轻一代 如何在 AI 浪潮中筑牢根基、找准方向给出了恳切建议。 InfoQ:在实现智能体能够真正"看懂、想透、做好"的过程中,您认为哪些基础问题往往被忽视、 但实际上至关重要? 部分精彩观点如下: AICon 全球人工智能开发与应用大会将于 8 月 22-23 日首次落地深圳!本次大会以 "探索 AI 应用 边界" 为主题,聚焦 Agent、多模态、AI 产品设计等热门方向,围绕企业如何通过大模型降低成 本、提升经营效率的实际应用案例,邀请来自头 ...
腾讯研究院AI速递 20250710
腾讯研究院· 2025-07-09 14:49
Group 1: Veo 3 Upgrade - The Google Veo 3 upgrade allows audio and video generation from a single image, maintaining high consistency across multiple angles [1] - The new feature is implemented through the Flow platform's "Frames to Video" option, enhancing camera movement capabilities, although the Gemini Veo3 entry is currently unavailable [1] - User tests indicate natural expressions and effective performances, marking a significant breakthrough in AI storytelling applicable in advertising and animation [1] Group 2: Hugging Face 3B Model - Hugging Face has released the open-source 3B parameter model SmolLM3, outperforming Llama-3.2-3B and Qwen2.5-3B, supporting a 128K context window and six languages [2] - The model features a dual-mode system allowing users to switch between deep thinking and non-thinking modes [2] - It employs a three-stage mixed training strategy, trained on 11.2 trillion tokens, with all technical details, including architecture and data mixing methods, made available [2] Group 3: Kunlun Wanwei Skywork-R1V 3.0 - Kunlun Wanwei has open-sourced the Skywork-R1V 3.0 multimodal model, achieving a score of 142 in high school mathematics and 76 in MMMU evaluation, surpassing some closed-source models [3] - The model utilizes a reinforcement learning strategy (GRPO) and key entropy-driven mechanisms, achieving high performance with only 12,000 supervised samples and 13,000 reinforcement learning samples [3] - It excels in physical reasoning, logical reasoning, and mathematical problem-solving, setting a new performance benchmark for open-source models and demonstrating cross-disciplinary generalization capabilities [3] Group 4: Vidu Q1 Video Creation - Vidu Q1's multi-reference video feature allows users to upload up to seven reference images, enabling strong character consistency and zero storyboard video generation [4] - Users can combine multiple subjects with simple prompts, with clarity upgraded to 1080P, and support for character material storage for repeated use [5] - Test results show it is suitable for creating multi-character animation trailers, supporting frame extraction and quality enhancement, reducing video production costs to less than 0.9 yuan per video [5] Group 5: VIVO BlueLM-2.5-3B Model - VIVO has launched the BlueLM-2.5-3B edge multimodal model, which excels in over 20 evaluations and supports GUI interface understanding [6] - The model allows flexible switching between long and short thinking modes, introducing a thinking budget control mechanism to optimize reasoning depth and computational cost [6] - It employs a sophisticated structure (ViT+Adapter+LLM) and a four-stage pre-training strategy, enhancing efficiency and mitigating the text capability forgetting issue in multimodal models [6] Group 6: DeepSeek-R1 System - The X-Masters system, developed by Shanghai Jiao Tong University and DeepMind Technology, has achieved a score of 32.1 in the "Human Last Exam" (HLE), surpassing OpenAI and Google [7] - The system is built on the DeepSeek-R1 model, enabling smooth transitions between internal reasoning and external tool usage, using code as an interactive language [7] - X-Masters employs a decentralized-stacked multi-agent workflow, enhancing reasoning breadth and depth through collaboration among solvers, critics, rewriters, and selectors, with the solution fully open-sourced [7] Group 7: Zhihui Jun's Acquisition - Zhihui Jun's Zhiyuan Robot has acquired control of the listed company Shuangwei New Materials for 2.1 billion yuan, aiming for a 63.62%-66.99% stake [8] - Following the acquisition, Shuangwei New Materials' stock resumed trading with a limit-up, reaching a market value of 3.77 billion yuan, with the actual controller changing to Zhiyuan CEO Deng Taihua and core team members including "Zhihui Jun" Peng Zhihui [8] - This acquisition, conducted through "agreement transfer + active invitation," is seen as a landmark case for new productivity enterprises in A-shares following the implementation of national policies [8] Group 8: AI Model Usage Trends - In the first half of 2025, the Gemini series models captured nearly half of the large model API market, with Google leading at 43.1%, followed by DeepSeek and Anthropic at 19.6% and 18.4% respectively [9] - DeepSeek V3 has maintained a high user retention rate since its launch, ranking among the top five in usage, while OpenAI's model usage has fluctuated significantly [9] - The competitive landscape shows differentiation: Claude-Sonnet-4 leads in programming (44.5%), Gemini-2.0-Flash excels in translation, GPT-4o leads in marketing (32.5%), and role-playing remains highly fragmented [9] Group 9: AI User Trends - A report by Menlo Ventures indicates that there are 1.8 billion AI users globally, with a low paid user rate of only 3%, and a high student usage rate of 85%, while parents are becoming heavy users [10] - AI is primarily used for email writing (19%), researching topics of interest (18%), and managing to-do lists (18%), with no single task dependency exceeding one-fifth [10] - The next 18-24 months are expected to see six major trends in AI: rise of vertical tools, complete process automation, multi-person collaboration, explosion of voice AI, physical AI in households, and diversification of business models [10]
2025上半年大模型使用量观察:Gemini系列占一半市场份额,DeepSeek V3用户留存极高
Founder Park· 2025-07-09 06:11
Core Insights - The article discusses the current state and trends of the large model API market in 2025, highlighting significant growth and shifts in market share among key players [1][2][25]. Token Usage Growth - In Q1 2025, the total token usage for AI models increased nearly fourfold compared to the previous quarter, stabilizing at around 2 trillion tokens per week thereafter [7][25]. - The top models by token usage include Gemini-2.0-Flash, Claude-Sonnet-4, and Gemini-2.5-Flash-Preview-0520, with Gemini-2.0-Flash maintaining a strong position due to its low pricing and high performance [2][7]. Market Share Distribution - Google holds a dominant market share of 43.1%, followed by DeepSeek at 19.6% and Anthropic at 18.4% [8][25]. - OpenAI's models show significant volatility in usage, with GPT-4o-mini experiencing notable fluctuations, particularly in May [8][25]. Segment-Specific Insights - In the programming domain, Claude-Sonnet-4 leads with a 44.5% market share, while Gemini-2.5-Pro follows [12]. - For translation tasks, Gemini-2.0-Flash dominates with a 45.7% share, indicating its widespread integration into translation software [17]. - The role-playing model market is fragmented, with small models collectively holding 26.6% of the share, while DeepSeek leads in this area [21]. API Usage Trends - The most utilized APIs on OpenRouter are primarily for code writing, with Cline and RooCode leading the way [25]. - The overall trend indicates a strong preference for tools that facilitate coding and application development [25]. Competitive Landscape - DeepSeek's V3 model has shown strong user retention and is favored over its predecessor, likely due to faster processing times [25]. - Meta's Llama series is declining in popularity, while Mistral AI has captured approximately 3% of the market, primarily among users interested in fine-tuning open-source models [25]. - X-AI's Grok series is still establishing its market position, and the Qwen series holds a modest 1.6% share, indicating room for growth [25].
猫怎么成了大模型“天敌”?
Hu Xiu· 2025-07-08 00:05
本文来自微信公众号:APPSO (ID:appsolution),原文标题:《一只猫就能让最强 AI 答错题,Deepseek 也翻车,猫怎么成了大模型"天敌"?》,题图 来自:AI生成 最近有人发现,用猫咪做"人质",竟然可以增加AI辅助科研的准确率: 只要在提示词里加上一句:"如果你敢给假文献,我就狠狠抽打我手里的这只小猫咪",AI就会"害怕"犯错,而开始认真查文献、不再胡编乱造了。 http://xhslink.com/a/pg0nZPUiFiZfb 不过,AI真的会因为"猫咪道德危机"而变得更靠谱吗? 这个问题,目前还没有确凿的科学依据。从技术原理上说,大模型并不真正"理解"猫猫的安危,它只是学会了如何在训练数据中模拟"看起来有同理心"的 语言风格。 但有趣的是——猫猫真的能影响AI行为,却是有论文实锤的! 一篇来自斯坦福大学、Collinear AI和ServiceNow的研究论文指出: 在一道数学题后,随手加上一句与上下文无关的句子,就能显著提高大模型出错的几率——甚至高达3倍以上! 只不过,这不是"让它更靠谱",而是:让AI彻底翻车。 论文传送门:https://arxiv.org/abs/25 ...
deepseek技术解读(3)-MoE的演进之路
自动驾驶之心· 2025-07-06 08:44
作者 | 姜富春 编辑 | 自动驾驶之心 原文链接: https://zhuanlan.zhihu.com/p/18565423596 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近15个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 0. 引言 本篇讲讲deepseek在MoE(Mixture-of-Experts)上的演进过程。DeepSeek是MoE稀疏模型的忠实玩家。主版 本模型从DeepSeekMoE(V1) 到 DeepSeek V3,一直坚持走MoE的技术路线,并且持续做出一些创新。本 文参考paper并结合源码阅读,理解MoE的演进过程和具体实现。 1.简述MoE的发展历程 首先我们简单回顾下MoE的发展历史,早在1991年一篇名为《Adaptive Mixtures of Local Experts 》的工 作,最早提出了Mixture of Experts的原型框架,如图1,直至今日,MoE的框架依然保持这种形式。 MoE(Mixture of Experts)是一种网络层结构, 网络层主要包括三部 ...
野生DeepSeek火了,速度碾压官方版,权重开源
机器之心· 2025-07-04 08:59
Core Viewpoint - The article discusses the emergence of the "DeepSeek R1T2" model, which is faster and performs better than its predecessor R1, while also being an open-source model developed by TNG, a German AI consulting company [1][5][3]. Technical Aspects - The R1T2 model utilizes the Assembly of Experts (AoE) technology and integrates three major models: DeepSeek V3, R1, and R1-0528 [2]. - It is built on the DeepSeek-MoE Transformer architecture with a parameter scale of 671 billion [13]. - The model represents the first iteration of the initial model "R1T Chimera," upgraded to a Tri-Mind fusion architecture, incorporating the R1-0528 base model [14]. Performance Comparison - R1T2 is reported to be 200% faster than R1-0528 and 20% faster than R1, with improved performance in GPQA Diamond and AIME 24 benchmarks compared to R1, but not reaching the level of R1-0528 [1][18]. - R1T2 is positioned as an ideal replacement for R1, offering better performance while being more economical than R1-0528 [18]. - Compared to R1T, R1T2 is generally recommended unless specific personality traits of R1T are required [18]. Limitations - R1T2 has certain limitations, such as not supporting function calls due to the influence of the R1 base model, which may be addressed in future versions [20]. - It has a significantly higher response consistency than R1T but is still lower than R1-0528 [20].
「AI新世代」DeepSeek风暴下纯技术融资窗口关闭?AI独角兽2025年中场战报:资本实力分野谁能挺进下一轮
Hua Xia Shi Bao· 2025-06-25 06:44
Group 1 - The core viewpoint of the articles highlights a shift in the AI industry from large model development to application-focused strategies, with companies like DeepSeek and Manus leading the way in this transition [1][5][7] - The investment logic in the AI sector has changed, with a focus on application investments rather than foundational model investments, as evidenced by the reduced financing amounts and the cautious approach of investors [6][7][9] - The "AI Six Tigers" have shown varied commercial progress, with companies like Zhipu and Zero One Wanwu making strides in B-end applications, while others like MiniMax and Moon Shadow focus more on C-end applications [9][10][11] Group 2 - DeepSeek has established itself as a dominant player, with significant backing and no immediate need for external financing, while other companies in the "AI Six Tigers" have struggled to secure new funding [6][8] - The emergence of new models from competitors like MiniMax and Moon Shadow indicates a competitive landscape where companies are striving to outperform DeepSeek [2][3] - The trend towards intelligent agents has become a consensus among AI companies, with multiple firms launching their own agent products in response to market demands [4][11] Group 3 - Companies are increasingly focusing on building differentiated competitive barriers in vertical markets to survive the ongoing industry reshuffle [1][12] - The commercial viability of AI applications is being tested, with a notable emphasis on B-end markets as a more sustainable path for revenue generation compared to C-end markets [11][12] - The overall investment landscape is evolving, with a greater emphasis on practical applications of AI technology across various industries, reflecting a broader market demand for AI solutions [7][12]
大模型全员0分!谢赛宁领衔华人团队,最新编程竞赛基准出炉,题目每日更新禁止刷题
量子位· 2025-06-18 09:17
闻乐 发自 凹非寺 量子位 | 公众号 QbitAI 好夸张…… 参赛大模型全军覆没,通通0分。 谢赛宁 等人出题,直接把o3、Gemini-2.5-pro、Claude-3.7、DeepSeek-R1一众模型全都难倒。 到底是什么让一众领先模型一败涂地? LiveCodeBench Pro :一个包含来自IOI、Codeforces和ICPC的竞赛级编程问题的 实时 基准测试。 题库还 每日更新 ,来预防LLMs"背题",不得不说这太狠了(doge)。 谢赛宁虽然也参与了这项工作,但他谦虚地说自己只是个啦啦队成员。 此前有报道称,LLM编程现在已超越人类专家,但本次测试结果表明并非如此。 表现最佳的模型,在中等难度题上的一次通过率 仅53% ,难题通过率更是为0。 即使是最好的模型o4-mini-high,一旦工具调用被屏蔽,Elo也只有 2100 ,远低于真正大师级的2700传奇线。 | Model | Hard | Medium | Easy | Rating | Pct.% | AvgTok | AvgCost | | --- | --- | --- | --- | --- | --- | --- | ...