DeepSeek V3
Search documents
DeepSeek blew up markets year ago. Why hasn't it done so since?
CNBC· 2026-01-06 06:00
Core Insights - DeepSeek's introduction of a new AI model in January 2025 caused significant market reactions, leading to a decline in stock prices of major Western tech companies, but the market has since stabilized and companies like Nvidia have seen substantial growth [1][2][3] Group 1: Market Reactions and Recovery - Following DeepSeek's initial model release, Nvidia's stock fell 17%, resulting in a loss of nearly $600 billion in market capitalization, while Broadcom and ASML also experienced significant declines [1] - Eleven months later, Nvidia achieved a $5 trillion valuation, Broadcom's shares increased by 49%, and ASML's stock rose by 36% [2] Group 2: DeepSeek's Model Releases - DeepSeek released its V3 model in late 2024, which was trained using less powerful chips and at a lower cost compared to models from OpenAI and Google [3][4] - The subsequent release of the R1 reasoning model in January 2025 surprised the market, as it matched or outperformed leading LLMs [4] Group 3: Market Dynamics and Spending - Despite initial concerns about reduced demand for AI infrastructure due to DeepSeek's model, spending in the AI sector did not slow down in 2025 and is expected to accelerate in 2026 and beyond [6][7] - The market has perceived DeepSeek's later model updates as incremental improvements rather than groundbreaking innovations [7] Group 4: Computational Limitations - DeepSeek has faced challenges in releasing new models due to limited computing power, particularly with the delay of the R2 model due to difficulties in training on Huawei chips [8][9] - U.S. restrictions on chip sales have constrained China's access to advanced computing resources, impacting DeepSeek's development capabilities [9][10] Group 5: Competitive Landscape - The release of advanced models by Western companies like OpenAI and Google has reassured the market of continued U.S. leadership in AI, easing fears of commoditization [12][13] - Analysts suggest that the competitive environment remains intense, with expectations of further significant releases from DeepSeek in the near future [13][14]
AI大模型分野:从技术狂热到商业价值回归
Xin Lang Cai Jing· 2025-12-25 12:40
当年初DeepSeek一夜爆红,打破原有大模型市场的格局,这一年就注定不平凡。2025年的中国大模型市场经历了 一场深刻的"价值回归",技术突破的边际效应减弱,一场围绕真实需求、可持续商业模式与产业深度的"生存进 化"全面展开。"2025年是全球化AI应用的创业之年。"顺福资本创始人、行行AI董事长李明顺总结道。 在此背景下,国内"AI六小虎"加剧赛道分化,零一万物和百川智能放弃超大模型训练,在更加务实的商业化应用 赛道越走越远,阶跃星辰将智能终端Agent作为⼤模型技术落地的关键发⼒点,在终端Agent领域取得突破,月之 暗面开始重视商业化,任命曾经的投资人为总裁,智谱和MiniMax则作为商业化的佼佼者率先成功闯关二级市 场。 DeepSeek的"起伏" 2025年初,一场由东方掀起的AI浪潮席卷全球应用市场。1月27日,来自中国的人工智能公司DeepSeek一举登顶 美国苹果商店免费应用下载榜首,将长期盘踞头部的ChatGPT暂时拉下王座,之后又迅速演变为一场全球性的现 象级传播——DeepSeek的名字随之刷屏各国社交网络,成为开年最受瞩目的科技焦点。 热度并未止步于年初的榜单登顶。整个上半年,Dee ...
AI大模型分野:从技术狂热到商业价值回归|2025中国经济年报
Hua Xia Shi Bao· 2025-12-25 08:16
文/石飞月 当年初DeepSeek一夜爆红,打破原有大模型市场的格局,这一年就注定不平凡。2025年的中国大模型 市场经历了一场深刻的"价值回归",技术突破的边际效应减弱,一场围绕真实需求、可持续商业模式与 产业深度的"生存进化"全面展开。"2025年是全球化AI应用的创业之年。"顺福资本创始人、行行AI董事 长李明顺总结道。 在此背景下,国内"AI六小虎"加剧赛道分化,零一万物和百川智能放弃超大模型训练,在更加务实的商 业化应用赛道越走越远,阶跃星辰将智能终端Agent作为⼤模型技术落地的关键发⼒点,在终端Agent领 域取得突破,月之暗面开始重视商业化,任命曾经的投资人为总裁,智谱和MiniMax则作为商业化的佼 佼者率先成功闯关二级市场。 DeepSeek的"起伏" 2025年初,一场由东方掀起的AI浪潮席卷全球应用市场。1月27日,来自中国的人工智能公司DeepSeek 一举登顶美国苹果商店免费应用下载榜首,将长期盘踞头部的ChatGPT暂时拉下王座,之后又迅速演变 为一场全球性的现象级传播——DeepSeek的名字随之刷屏各国社交网络,成为开年最受瞩目的科技焦 点。 热度并未止步于年初的榜单登顶。整 ...
Mamba作者团队提出SonicMoE:一个Token舍入,让MoE训练速度提升近2倍
机器之心· 2025-12-19 06:38
Core Insights - The MoE (Mixture of Experts) model has become the standard architecture for scaling language models without significantly increasing computational costs, showing trends of higher expert granularity and sparsity, which enhance model quality per unit FLOPs [1][2] MoE Model Trends - Recent open-source models like DeepSeek V3, Kimi K2, and Qwen3 MoE exhibit finer-grained expert designs and higher sparsity, significantly increasing total parameter count while maintaining the number of active parameters [1][2] - The table of recent models indicates varying parameters, expert activation ratios, and expert granularities, with models like Mixtral 8x22B having 131 billion parameters and a 25% expert activation ratio [2] Hardware Efficiency Challenges - The pursuit of extreme granularity and sparsity in MoE designs has led to significant hardware efficiency issues, prompting the development of SonicMoE, a solution tailored for NVIDIA Hopper and Blackwell architecture GPUs [3] - SonicMoE demonstrates performance advantages, achieving a 43% speed increase in forward propagation and up to 115% in backward propagation compared to existing baselines [3] Memory and IO Bottlenecks - Fine-grained MoE models face linear growth in activation memory usage with the number of active experts, leading to increased memory pressure during forward and backward propagation [4] - The reduced arithmetic intensity in smaller, dispersed experts results in more frequent IO access, pushing model training into a memory-constrained zone [4] Efficient Algorithms - SonicMoE introduces a method to compute routing gradients without caching activation values, reducing backward propagation memory usage by 45% for fine-grained models [4] - The design allows for overlapping computation and IO operations, effectively masking high IO latency associated with fine-grained MoE [4] Token Rounding Strategy - The token rounding method optimizes the distribution of tokens to experts, minimizing computational waste due to tile quantization effects, thus enhancing training efficiency without compromising model quality [4][20][26] Performance Metrics - SonicMoE achieves a training throughput of 213 billion tokens per day using 64 H100 GPUs, comparable to the efficiency of 96 H100 GPUs running ScatterMoE [6] - The memory usage for activation remains constant even as expert granularity increases, with efficiency improvements ranging from 0.20 to 1.59 times over existing baselines [9][15] Open Source Contribution - The team has open-sourced the relevant kernel code, providing a robust tool for the large model community to accelerate high-performance MoE training [7]
China narrows AI gap with US 3 years after initial ChatGPT shock
Yahoo Finance· 2025-12-13 09:30
The report attributed this year's surge in open LLM usage around the world to the growing adoption of Chinese-developed systems, including Alibaba Cloud 's Qwen family of models, DeepSeek 's V3 and Moonshot AI 's Kimi K2 . Alibaba Cloud is the AI and cloud computing services unit of Alibaba Group Holding , owner of the Post.ChatGPT was released by OpenAI on November 30, 2022. Photo: Shutterstock alt=ChatGPT was released by OpenAI on November 30, 2022. Photo: Shutterstock>Fast-forward to the second half of 2 ...
China's open-source models make up 30% of global AI usage, led by Qwen and DeepSeek
Yahoo Finance· 2025-12-08 09:30
China's open-source artificial intelligence models accounted for nearly 30 per cent of total global use of the technology, while Chinese-language prompts ranked second in token volume behind English, according to a report. This year's surge in open-source large language model (LLM) usage around the world had been fuelled by Chinese-developed systems, including Alibaba Group Holding's Qwen family of models, DeepSeek's V3 and Moonshot AI's Kimi K2, according to a recently published report by OpenRouter, a t ...
DeepSeek V3到V3.2的进化之路,一文看全
机器之心· 2025-12-08 04:27
Core Insights - DeepSeek has released two new models, DeepSeek-V3.2 and DeepSeek-V3.2-Speciale, which have generated significant interest and discussion in the AI community [2][5][11] - The evolution from DeepSeek V3 to V3.2 includes various architectural improvements and the introduction of new mechanisms aimed at enhancing performance and efficiency [10][131] Release Timeline - The initial release of DeepSeek V3 in December 2024 did not create immediate buzz, but the subsequent release of the DeepSeek R1 model changed the landscape, making DeepSeek a popular alternative to proprietary models from companies like OpenAI and Google [11][14] - The release of DeepSeek V3.2-Exp in September 2025 was seen as a preparatory step for the V3.2 model, focusing on establishing the necessary infrastructure for deployment [17][49] Model Types - DeepSeek V3 was initially launched as a base model, while DeepSeek R1 was developed as a specialized reasoning model through additional training [19][20] - The trend in the industry has seen a shift from hybrid reasoning models to specialized models, with DeepSeek seemingly reversing this trend by moving from specialized (R1) to hybrid models (V3.1 and V3.2) [25] Evolution from V3 to V3.1 - DeepSeek V3 utilized a mixed expert model and multi-head latent attention (MLA) to optimize memory usage during inference [29][30] - DeepSeek R1 focused on Reinforcement Learning with Verifiable Rewards (RLVR) to enhance reasoning capabilities, particularly in tasks requiring symbolic verification [37][38] Sparse Attention Mechanism - DeepSeek V3.2-Exp introduced a non-standard sparse attention mechanism, which significantly improved efficiency in training and inference, especially in long-context scenarios [49][68] - The DeepSeek Sparse Attention (DSA) mechanism allows the model to selectively focus on relevant past tokens, reducing computational complexity from quadratic to linear [68] Self-Verification and Self-Correction - DeepSeekMath V2, released shortly before V3.2, introduced self-verification and self-correction techniques to improve the accuracy of mathematical reasoning tasks [71][72] - The self-verification process involves a verifier model that assesses the quality of generated proofs, while self-correction allows the model to iteratively improve its outputs based on feedback [78][92] DeepSeek V3.2 Architecture - DeepSeek V3.2 maintains the architecture of its predecessor, V3.2-Exp, while incorporating improvements aimed at enhancing overall model performance across various tasks, including mathematics and coding [107][110] - The model's training process has been refined to include updates to the RLVR framework, integrating new reward mechanisms for different task types [115][116] Performance Benchmarks - DeepSeek V3.2 has shown competitive performance in various benchmarks, achieving notable results in mathematical tasks and outperforming several proprietary models [127]
估值低,仓位轻!摩根大通上调中国股市评级,看好AI应用加速和反内卷
Hua Er Jie Jian Wen· 2025-12-03 03:27
华尔街看好中国股市复苏势头,认为人工智能的加速应用、产业"反内卷"等多个积极因素正对市场构成 支撑。 12月3日,据追风交易台消息,摩根大通在其展望2026年的新兴市场股票策略报告中,将中国股市的评 级从"中性"上调至"增持"(Overweight),认为市场正处于复苏的早期阶段,其可接受的估值和仍然较 轻的投资者仓位,为潜在上涨提供了坚实基础。 这份报告明确指出,中国股市在2025年第一季度经历飙升后出现回调,这反而造就了一个"有吸引力的 切入点"。由策略师Rajiv Batra领导的团队认为,进入2026年,多重增量支撑因素正在为中国市场积聚 能量,并作出了一个关键判断:"我们认为,2026年大幅上涨的风险远高于大幅下跌的风险。" 报告强调,"对中国股市的积极情绪也往往为新兴市场的整体表现和主动型基金的资金流向定下基 调。"基于此,摩根大通的基本情景(Base case)预测MSCI中国指数2026年底的目标位为100点,较报 告发布时有19%的上涨空间。其牛市情景(Bull case)目标为120点,熊市情景(Bear case)为80点。 这一评级上调的背后,是摩根大通对中国市场一系列结构性变化的看好 ...
谁在为美国买单?
Guan Cha Zhe Wang· 2025-11-18 01:04
Group 1: Core Insights - The U.S. is leveraging its retirement funds to fill a projected $1.5 trillion financing gap in AI investments, as tech giants' cash flows can only cover half of the expected $3 trillion global data center capital expenditure by 2028 [1][3] - The U.S. private capital market dominates AI investments, with $109.1 billion in private AI investment in 2024, nearly 12 times China's $9.3 billion [3][4] - The U.S. government and major tech companies are also investing heavily in AI, with a combined planned investment of $36.4 billion in 2025, contributing to GDP growth [4][5] Group 2: Investment Landscape - Venture capital and private equity are significant sources of funding, with over 50% of global VC funds directed towards AI, and the U.S. accounting for more than 75% of this [3][4] - The bond market is a primary financing tool for tech giants, with over $2 trillion in investment-grade corporate bonds issued in the first ten months of 2025, and insurance companies being key buyers [4][5] - The U.S. has seen a surge in AI-related stock performance, contributing to 75% of the S&P 500's returns since the launch of ChatGPT in 2022 [5][6] Group 3: Competitive Advantages - The U.S. has a unique financial and innovation ecosystem that supports AI investment, including a robust VC network and top-tier universities [5][6] - The U.S. controls 74% of the global high-end AI computing capacity, significantly outpacing China and the EU [11][12] - Early investments in computing and software have positioned the U.S. as a leader in AI innovation, with a tenfold increase in annual investments from 1995 to 2021 [9][11] Group 4: Challenges and Risks - The rapid increase in AI investments has led to signs of a bubble, with a high dependency on optimistic investor expectations [6][7] - Regulatory compliance costs are rising, with fragmented state-level AI regulations increasing operational costs for companies [7][8] - The potential for a financial crisis exists if the AI investment bubble bursts, given the concentration of market value among a few tech giants [6][8] Group 5: China's Position and Strategy - China is significantly behind in private AI investment, with only $39 billion compared to the U.S., but is leveraging a state-led approach to build resilience in AI funding [13][14] - China's strategy focuses on application-oriented AI, cost reduction through local chip production, and global outreach to developing countries [13][14] - The competitive edge for China lies in its ability to innovate at lower costs, as demonstrated by companies like DeepSeek, which offers AI solutions at a fraction of the cost of U.S. counterparts [14]
梁文锋代表DeepSeek,他代表梁文锋
量子位· 2025-11-15 02:08
Core Viewpoint - The article discusses the emergence of "Hangzhou Six Little Dragons" at the World Internet Conference in Wuzhen, highlighting the presence of key figures in AI and technology, particularly focusing on DeepSeek and its representative, Chen Deli, who expressed both optimism and concerns about the future impact of AI on society [1][3][41]. Group 1: DeepSeek and Its Representation - DeepSeek's founder Liang Wenfeng did not attend the conference; instead, researcher Chen Deli represented the company, marking a significant public appearance for DeepSeek [3][6][41]. - Chen Deli, who joined DeepSeek in 2023, has been involved in critical research areas such as language models and alignment mechanisms, contributing to several important publications [18][22][20]. - The article notes that Chen Deli's presence at the conference has made him the second public representative of DeepSeek after Liang Wenfeng, emphasizing his role as a spokesperson for the company's views on AI [41][42]. Group 2: AI Perspectives - Chen Deli expressed a mixed outlook on AI, stating that while there is a "honeymoon period" between humans and AI over the next three to five years, there are significant long-term concerns about AI potentially replacing most jobs in society [8][9]. - He highlighted that the current AI revolution differs fundamentally from previous industrial revolutions, as AI is beginning to possess its own "intelligence," which could surpass human capabilities in certain areas [10][11]. - The potential for AI to disrupt existing social order and economic structures is a major concern, with Chen suggesting that technology companies may need to act as "guardians" to mitigate negative impacts [12][13]. Group 3: Value Alignment in AI - During his presentation, Chen Deli introduced the concept of "value alignment decoupling," proposing that core values should be unified while allowing users to customize diverse values, ensuring safety and adaptability to societal diversity [25][24]. - This approach aims to address the rigidity of traditional large models, which often embed fixed values that do not reflect the complexity of human society [24][25]. - The idea of "harmony in diversity" encapsulates this new perspective on AI value alignment, suggesting a more flexible and user-centric approach to AI development [26][25].