Workflow
DeepEP
icon
Search documents
超DeepEP两倍!无问芯穹FUSCO以「空中变阵」突破MoE通信瓶颈,专为Agent爆发设计
机器之心· 2025-12-31 09:31
随着 ChatGPT、Gemini、DeepSeek-V3、Kimi-K2 等主流大模型纷纷采用混合专家架构(Mixture-of-Experts, MoE)及专家并行策略(Expert Parallelism, EP),MoE 技术已在产业应用中逐渐成为主流。 机器之心发布 MoE 模型因其结构上的稀疏性与专家并行特性,天然引入了频繁且规模庞大的全局分布式数据交换。而 当前主流通信库及解决方案(如 DeepEP) 仍基于 "通信 与数据布局解耦" 的传统设计假设,难以高效应对实际生产中的跨设备、非连续、动态重排的数据访问模式,在高并发、长上下文与大规模专家配置的场景下, DeepEP 性能已逐渐趋近瓶颈,直接制约了 MoE 大模型的持续落地、系统稳定扩展与经济性运行。 与此同时,以代码智能体、Cursor 类对话式 IDE 为代表的新型应用, 一方 面 显著推高了用户请求规模,另一方面大幅拉长了单次推理的上下文长度,两者均呈现 出一个数量级以上的增长 。在 MoE 架构下,这种变化不仅线性放大了计算开销,还显著增加了跨专家的通信与调度成本,使得整体系统压力接近一个数量级提 升,并在规模化服务场景中进一步被放 ...
DeepSeek致谢腾讯技术团队:对DeepEP的优化,是一次“huge speedup”代码贡献
Xin Lang Ke Ji· 2025-05-07 11:12
Core Insights - Tencent's technical team has optimized the DeepEP communication framework, achieving significant performance improvements across various network environments, with a 100% performance increase in RoCE networks and a 30% increase in IB networks, enhancing AI large model training efficiency [1][2] Group 1: Technical Enhancements - The optimization involved replacing IBRC with IBGDA and utilizing distinct Queue Pairs (QPs) per channel for parallel data transmission, which improved the robustness and communication performance of the normal kernels [1] - The algorithm bandwidth for the optimized framework reached 58 GB/s in RDMA scenarios, with physical bandwidth calculated at 43.5 GB/s [1] Group 2: Industry Impact - Since the open-sourcing of DeepSeek, including DeepEP, in February, the framework has demonstrated a 300% increase in communication efficiency, addressing the dependency on NVIDIA NCCL for MoE architecture large models [2] - The optimizations have been successfully applied in Tencent's mixed Yuan model projects, showcasing excellent versatility in high-performance environments built with Tencent's Starry Network and H20 servers [2]
DeepSeek再开源,关注AI应用变化
HTSC· 2025-03-03 13:25
Investment Rating - The report maintains a "Buy" rating for the computer industry, specifically for companies like Kingsoft Office, Tonghuashun, and Yonyou Network [7][10][26]. Core Insights - DeepSeek has opened its Infra core code, enhancing model efficiency and hardware compatibility, particularly with domestic GPUs, which is expected to lower application costs and improve performance [1][2][3]. - The report highlights a divergence in strategies between domestic and overseas model companies, with overseas firms focusing on large computing power while domestic firms prioritize efficiency optimization [4]. - The potential for model capabilities to become fundamental resources akin to "water and electricity" is emphasized, suggesting significant advantages for companies leveraging these capabilities [5]. Summary by Sections Investment Rating - The report provides a "Buy" rating for Kingsoft Office (688111 CH), Tonghuashun (300033 CH), and Yonyou Network (600588 CH) with target prices of 351.05, 425.23, and 16.12 respectively [10][26]. DeepSeek Developments - DeepSeek's recent open-source initiatives include core optimizations in MLA, communication-computation, and matrix multiplication, which are expected to enhance global model training and inference efficiency [2][3]. - The report notes that DeepSeek's model training has been optimized for CUDA, with successful adaptations for domestic GPUs, indicating a growing ecosystem for local chip manufacturers [3]. Market Dynamics - The report identifies a trend where overseas companies like xAI and OpenAI are expanding their GPU clusters to enhance performance, while domestic companies are focusing on software and hardware efficiency improvements [4]. - The analysis suggests that the cost-profit margin for DeepSeek's services could reach 545% under optimal conditions, highlighting the financial viability of its model [1][22]. Recommended Companies - Companies with user, data, and scenario advantages are recommended, including Kingsoft Office, Tonghuashun, and Yonyou Network, as well as other relevant players in the 2B and 2C application sectors [5][10][26].
爱建证券电子行业周报:DeepSeek开源周发布五大技术
Investment Rating - The report rates the electronic industry as "Outperform" compared to the market [1]. Core Insights - The electronic industry experienced a decline of 4.9% in the past week, ranking 28th out of 31 sectors, while the SW electronic sub-sectors showed mixed performance with semiconductor materials up by 0.4% and others down [2][44]. - DeepSeek launched five open-source projects aimed at enhancing AI model efficiency, showcasing a competitive strategy against OpenAI's high-cost models [2][28]. - The report highlights significant advancements in AI hardware and software, indicating a potential surge in demand for domestic semiconductor chips [2][40]. Summary by Sections 1. DeepSeek Open Source Week Releases - DeepSeek announced the launch of five open-source projects to enhance AI capabilities, including FlashMLA for efficient model inference and DeepEP for improved GPU communication [5][9]. - FlashMLA achieved a data throughput of 3000 GB/s and 580 TFLOPS on the H800 platform, nearly doubling performance compared to previous models [6][8]. - DeepEP optimized GPU communication, achieving a bottleneck bandwidth of 153 GB/s for intra-node and 46 GB/s for inter-node communications [11][12]. 2. Global Industry Dynamics - NVIDIA reported a record revenue of $39.3 billion for Q4 2025, with significant growth in data center revenues [30][31]. - OpenAI launched its largest model, GPT-4.5, which is expected to enhance performance significantly but comes with a high API cost [33][34]. - Alibaba announced a massive investment of 380 billion yuan in cloud and AI hardware infrastructure over the next three years, marking a significant commitment to the sector [36]. 3. Market Review - The electronic industry saw a decline of 4.9% in the past week, with semiconductor materials showing slight gains while other sectors faced losses [2][44]. - The report lists top-performing stocks in the electronic sector, with notable gains from companies like Aojie Technology (+30.0%) and Chipone Technology (+27.4%) [48]. - The Philadelphia Semiconductor Index experienced a decline of 11.7%, reflecting broader market challenges [51].
DeepSeek开源周活动收官,将加快大模型在各行业的应用落地
Ping An Securities· 2025-03-03 09:15
Investment Rating - The industry investment rating is "stronger than the market" (预计6个月内,行业指数表现强于市场表现5%以上) [32] Core Views - The DeepSeek open-source week has concluded, accelerating the application of large models across various industries [2][3] - The competition among global large models remains intense, providing strong support for the continuous growth of AI computing power [8][11] - NVIDIA's FY25Q4 performance is strong, with robust demand on the inference side of the Blackwell architecture [13][15] Summary by Sections Industry News and Commentary - The DeepSeek open-source week launched five open-source software library projects covering computation, communication, and storage, which will facilitate the replication of DeepSeek-V3/R1 by global developers [2][5] - The release of models like Grok-3, Claude 3.7 Sonnet, and GPT-4.5 indicates ongoing fierce competition in the global large model market, which is expected to elevate the capabilities of these models [9][11] - NVIDIA reported FY25Q4 revenue of $39.3 billion, a 12% quarter-over-quarter increase and a 78% year-over-year increase, driven primarily by data center business growth [13][14] Investment Recommendations - The report suggests a positive outlook for the computer industry, anticipating dual improvements in performance and valuation due to accelerating demand recovery [28] - Recommended stocks include: - Innovation and Creation Sector: Haiguang Information, Longxin Zhongke, Zhongke Shuguang, Kingsoft Office, Dameng Data, Foxit Software, Taiji Co., Ltd. - Huawei Supply Chain: Digital China, with a focus on Tuo Wei Information, Kirin Information Security, Runhe Software, and others - AI Sector: Strong recommendations for Zhongke Chuangda, Shengshi Technology, and Qiming Star, among others - Low-altitude Economy: Recommended stocks include Da Tong Technology and others - Financial IT Sector: Strong recommendation for Hengsheng Electronics, with additional suggestions for Tonghuashun and others [28]
全面适配!京东云将DeepSeek推理场景性能提升50%
Zhong Guo Jing Ji Wang· 2025-03-03 09:10
Core Insights - DeepSeek's five core technologies (FlashMLA, DeepEP, DeepGEMM, DualPipe & EPLB, 3FS file system) were showcased during a five-day "Open Source Week," achieving significant global attention [1] - JD Cloud announced full-stack adaptation of these technologies, resulting in a 50% performance improvement in inference scenarios [1][2] Group 1: Technology Enhancements - Flash MLA optimizes GPU memory and computational resources, addressing resource wastage in traditional methods for processing variable-length sequences [1] - The vGPU AI computing platform supports Flash MLA's FP8 format, reducing single Token's KV Cache memory usage by 57 times compared to Multi-head Attention, ensuring high throughput and low latency under high concurrency [1] Group 2: Communication and Performance - JD Cloud's vGPU AI computing platform fully supports distributed inference using the DeepEP communication library, significantly enhancing inference throughput [2] - By integrating DeepEP, JD Cloud utilizes NVLink for intra-machine communication and NVSHMEM for inter-machine communication, improving GPU resource utilization and reducing performance bottlenecks [2] Group 3: Local Deployment and Adaptation - JD Cloud has assisted multiple local governments in deploying DeepSeek based on existing infrastructure, allowing local enterprises to access the service without resource investment [3] - The platform has achieved comprehensive domestic chip adaptation, ensuring self-control from foundational computing to large model applications, including over ten domestic AI computing solutions [2]
DeEPSeek:EP降本,关注应用与算力
HTSC· 2025-03-03 02:35
Investment Rating - The report maintains an "Overweight" rating for the technology sector and the computer industry [6]. Core Insights - DeepSeek has significantly reduced inference costs, achieving a theoretical daily revenue of $562,027 against a cost of $87,072, indicating a profit margin of 545% if 15% of tokens are paid [2][4]. - The optimization of the DeepSeek-V3/R1 inference system focuses on higher throughput and lower latency through Expert Parallelism (EP) [3]. - The pricing difference in inference services between domestic and international models reflects the constraints in external computing power supply, with DeepSeek offering a more cost-effective solution [4]. Summary by Sections Investment Rating - The report recommends "Buy" for Inspur Information (浪潮信息) with a target price of 61.41 CNY, reflecting a strong growth outlook in the AI server market [9][14]. Cost and Revenue Analysis - DeepSeek's inference system operates at a peak node utilization of 278 nodes, with an average of 226.75 nodes, and a GPU rental cost assumed at $2 per hour [2]. - The average processing cost per million tokens is $0.11, while the R1 model pricing is significantly lower than competitors like OpenAI [2][4]. Technical Optimization - The DeepSeek-V3/R1 system employs a pre-fill and decode architecture to enhance parallel computation across nodes, aiming for reduced latency and improved performance [3]. Market Dynamics - The report highlights that domestic AI model providers are optimizing hardware performance under supply constraints, which may lead to increased market share in global applications [4][5].
【兴证计算机】DeepSeek跟踪:AI平权领军,加力开源及降价
兴业计算机团队· 2025-03-02 11:41
Core Viewpoint - The article emphasizes maintaining long-term confidence and increasing positions in core leading companies during market adjustments, particularly in the AI and domestic sectors [2][4]. Summary by Sections Weekly Viewpoint - The market has experienced adjustments due to previous rapid increases and the disclosure of performance reports. Looking ahead, the completion of performance reports and the upcoming important policy window are expected to improve performance in Q1 2025 [2][4]. Deep Dive on DeepSeek - DeepSeek, a leader in AI equity, has made significant progress by open-sourcing various solutions, including DeepEP and DeepGEMM, which optimize both hardware and applications. The company has also implemented a "staggered pricing" strategy, significantly reducing API call prices during off-peak hours, which is anticipated to accelerate AI application development [2][3]. Global AI Industry Trends - The global AI industry is experiencing accelerated resonance, with notable advancements from domestic models like Doubao and DeepSeek. OpenAI's release of ChatGPT 4.5 has also shown significant improvements in human-like interactions, contributing to the ongoing trend in the AI sector [2]. Investment Strategy - The recommendation is to continue over-allocating resources to the AI sector, focusing on core leading companies as the industry trends evolve and mature [2].
【兴证计算机】DeepSeek跟踪:AI平权领军,加力开源及降价
兴业计算机团队· 2025-03-02 11:41
点击上方"公众号"可订阅哦! 兴业证券计算机小组 蒋佳霖/孙乾/陈鑫/杨本鸿/张旭光/杨海盟/桂杨 本周观点聚焦 1、本周 观 点: 坚定中长期信心,在调整中加仓核心龙头 2、深度跟 踪: DeepSeek 跟踪: AI 平权领军,加力开源及降价 周观点 坚定中长期信心,在调整中加仓核心龙头 坚定中长期信心,在调整中加仓核心龙头。 本周,受前期上涨较快及业绩快报披露等因素影响,板块出现一定的调整。展望后市,快报已披露完毕,即将进入重要的政 策窗口期,同时 2025Q1 业绩改善值得期待。建议围绕政策及技术两大维度,积极加仓核心赛道龙头,主要包括人工智能及国产化。 全球 AI 产业正加速共振,继续超配 AI 赛道。 在国内, DeepSeek 本周开源了包括 DeepEP 、 DeepGEMM 等在内的一系列解决方案,覆盖从底层硬件优化到上层应 用多环节;同时,实施"错峰定价"策略,大幅下调夜间空闲时段的 API 调用价格,有望加速 AI 应用发展。在海外, OpenAI 本周发布 ChatGPT 4.5 ,其在人性化交互 方面实现明显提升。今年以来,自从包括豆包、 DeepSeek 在内的国产大模型突破以来, ...
传媒行业周报:GPT-4.5发布,DeepSeek“开源周”收官
GOLDEN SUN SECURITIES· 2025-03-02 02:55
Investment Rating - The report maintains an "Increase" rating for the media sector [6]. Core Viewpoints - The media sector experienced a decline of 8.06% during the week of February 24-28, 2025, influenced by market conditions. The outlook for 2025 is optimistic, focusing on AI applications and mergers and acquisitions, particularly in state-owned enterprises [1][10]. - The release of "Nezha 2" has further boosted the popularity of domestic IPs, highlighting significant opportunities in the IP monetization value chain, including trendy toys and film content [1]. - The publishing and gaming sectors are expected to benefit from tax relief policies, with the publishing industry projected to see high growth in 2025 [1]. Summary by Sections Market Overview - The media sector's performance was notably poor, ranking among the bottom three sectors, with a decline of 8.06% [10]. - The top-performing sectors included steel, building materials, and real estate, while the computer and communication sectors also faced significant declines [10]. Subsector Insights - Key focus areas include: 1. Resource integration expectations: Companies like China Vision Media, Guoxin Culture, and others are highlighted [2]. 2. AI applications: Companies such as Aofei Entertainment and Tom Cat are noted for their potential [2]. 3. Gaming: Strong recommendations for companies like Shenzhou Taiyue and Kaixin Network [2]. 4. State-owned enterprises: Companies like Ciweng Media and Anhui New Media are emphasized [2]. 5. Education: Companies like Xueda Education and Action Education are mentioned [2]. 6. Hong Kong stocks: Notable mentions include Tencent Holdings and Pop Mart [2]. Key Events Review - The release of GPT-4.5 by OpenAI, which boasts over ten times the computational efficiency of GPT-4, is a significant development in AI technology [21]. - DeepSeek's open-source initiatives, including the release of various codebases, are aimed at enhancing data access and model training efficiency [21]. - Alibaba's launch of the video generation model Wan 2.1 showcases advancements in video technology, particularly in generating synchronized movements and text within videos [21]. Subsector Data Tracking - The gaming sector is seeing a variety of new game releases, with popular titles currently available for pre-order [23]. - The domestic film market's total box office for the week was approximately 431 million yuan, with "Nezha: The Devil's Child" leading the box office [24][26]. - The top-rated series and variety shows reflect strong viewer engagement, with "Difficult to Please" and "Mars Intelligence Agency Season 7" leading in viewership [27][28].