DeepGEMM

Search documents
爱建证券电子行业周报:DeepSeek开源周发布五大技术
Shanghai Aijian Securities· 2025-03-03 10:10
Investment Rating - The report rates the electronic industry as "Outperform" compared to the market [1]. Core Insights - The electronic industry experienced a decline of 4.9% in the past week, ranking 28th out of 31 sectors, while the SW electronic sub-sectors showed mixed performance with semiconductor materials up by 0.4% and others down [2][44]. - DeepSeek launched five open-source projects aimed at enhancing AI model efficiency, showcasing a competitive strategy against OpenAI's high-cost models [2][28]. - The report highlights significant advancements in AI hardware and software, indicating a potential surge in demand for domestic semiconductor chips [2][40]. Summary by Sections 1. DeepSeek Open Source Week Releases - DeepSeek announced the launch of five open-source projects to enhance AI capabilities, including FlashMLA for efficient model inference and DeepEP for improved GPU communication [5][9]. - FlashMLA achieved a data throughput of 3000 GB/s and 580 TFLOPS on the H800 platform, nearly doubling performance compared to previous models [6][8]. - DeepEP optimized GPU communication, achieving a bottleneck bandwidth of 153 GB/s for intra-node and 46 GB/s for inter-node communications [11][12]. 2. Global Industry Dynamics - NVIDIA reported a record revenue of $39.3 billion for Q4 2025, with significant growth in data center revenues [30][31]. - OpenAI launched its largest model, GPT-4.5, which is expected to enhance performance significantly but comes with a high API cost [33][34]. - Alibaba announced a massive investment of 380 billion yuan in cloud and AI hardware infrastructure over the next three years, marking a significant commitment to the sector [36]. 3. Market Review - The electronic industry saw a decline of 4.9% in the past week, with semiconductor materials showing slight gains while other sectors faced losses [2][44]. - The report lists top-performing stocks in the electronic sector, with notable gains from companies like Aojie Technology (+30.0%) and Chipone Technology (+27.4%) [48]. - The Philadelphia Semiconductor Index experienced a decline of 11.7%, reflecting broader market challenges [51].
DeepSeek开源周活动收官,将加快大模型在各行业的应用落地
Ping An Securities· 2025-03-03 09:15
Investment Rating - The industry investment rating is "stronger than the market" (预计6个月内,行业指数表现强于市场表现5%以上) [32] Core Views - The DeepSeek open-source week has concluded, accelerating the application of large models across various industries [2][3] - The competition among global large models remains intense, providing strong support for the continuous growth of AI computing power [8][11] - NVIDIA's FY25Q4 performance is strong, with robust demand on the inference side of the Blackwell architecture [13][15] Summary by Sections Industry News and Commentary - The DeepSeek open-source week launched five open-source software library projects covering computation, communication, and storage, which will facilitate the replication of DeepSeek-V3/R1 by global developers [2][5] - The release of models like Grok-3, Claude 3.7 Sonnet, and GPT-4.5 indicates ongoing fierce competition in the global large model market, which is expected to elevate the capabilities of these models [9][11] - NVIDIA reported FY25Q4 revenue of $39.3 billion, a 12% quarter-over-quarter increase and a 78% year-over-year increase, driven primarily by data center business growth [13][14] Investment Recommendations - The report suggests a positive outlook for the computer industry, anticipating dual improvements in performance and valuation due to accelerating demand recovery [28] - Recommended stocks include: - Innovation and Creation Sector: Haiguang Information, Longxin Zhongke, Zhongke Shuguang, Kingsoft Office, Dameng Data, Foxit Software, Taiji Co., Ltd. - Huawei Supply Chain: Digital China, with a focus on Tuo Wei Information, Kirin Information Security, Runhe Software, and others - AI Sector: Strong recommendations for Zhongke Chuangda, Shengshi Technology, and Qiming Star, among others - Low-altitude Economy: Recommended stocks include Da Tong Technology and others - Financial IT Sector: Strong recommendation for Hengsheng Electronics, with additional suggestions for Tonghuashun and others [28]
【兴证计算机】DeepSeek跟踪:AI平权领军,加力开源及降价
兴业计算机团队· 2025-03-02 11:41
Core Viewpoint - The article emphasizes maintaining long-term confidence and increasing positions in core leading companies during market adjustments, particularly in the AI and domestic sectors [2][4]. Summary by Sections Weekly Viewpoint - The market has experienced adjustments due to previous rapid increases and the disclosure of performance reports. Looking ahead, the completion of performance reports and the upcoming important policy window are expected to improve performance in Q1 2025 [2][4]. Deep Dive on DeepSeek - DeepSeek, a leader in AI equity, has made significant progress by open-sourcing various solutions, including DeepEP and DeepGEMM, which optimize both hardware and applications. The company has also implemented a "staggered pricing" strategy, significantly reducing API call prices during off-peak hours, which is anticipated to accelerate AI application development [2][3]. Global AI Industry Trends - The global AI industry is experiencing accelerated resonance, with notable advancements from domestic models like Doubao and DeepSeek. OpenAI's release of ChatGPT 4.5 has also shown significant improvements in human-like interactions, contributing to the ongoing trend in the AI sector [2]. Investment Strategy - The recommendation is to continue over-allocating resources to the AI sector, focusing on core leading companies as the industry trends evolve and mature [2].
【兴证计算机】DeepSeek跟踪:AI平权领军,加力开源及降价
兴业计算机团队· 2025-03-02 11:41
点击上方"公众号"可订阅哦! 兴业证券计算机小组 蒋佳霖/孙乾/陈鑫/杨本鸿/张旭光/杨海盟/桂杨 本周观点聚焦 1、本周 观 点: 坚定中长期信心,在调整中加仓核心龙头 2、深度跟 踪: DeepSeek 跟踪: AI 平权领军,加力开源及降价 周观点 坚定中长期信心,在调整中加仓核心龙头 坚定中长期信心,在调整中加仓核心龙头。 本周,受前期上涨较快及业绩快报披露等因素影响,板块出现一定的调整。展望后市,快报已披露完毕,即将进入重要的政 策窗口期,同时 2025Q1 业绩改善值得期待。建议围绕政策及技术两大维度,积极加仓核心赛道龙头,主要包括人工智能及国产化。 全球 AI 产业正加速共振,继续超配 AI 赛道。 在国内, DeepSeek 本周开源了包括 DeepEP 、 DeepGEMM 等在内的一系列解决方案,覆盖从底层硬件优化到上层应 用多环节;同时,实施"错峰定价"策略,大幅下调夜间空闲时段的 API 调用价格,有望加速 AI 应用发展。在海外, OpenAI 本周发布 ChatGPT 4.5 ,其在人性化交互 方面实现明显提升。今年以来,自从包括豆包、 DeepSeek 在内的国产大模型突破以来, ...
传媒行业周报:GPT-4.5发布,DeepSeek“开源周”收官
GOLDEN SUN SECURITIES· 2025-03-02 02:55
Investment Rating - The report maintains an "Increase" rating for the media sector [6]. Core Viewpoints - The media sector experienced a decline of 8.06% during the week of February 24-28, 2025, influenced by market conditions. The outlook for 2025 is optimistic, focusing on AI applications and mergers and acquisitions, particularly in state-owned enterprises [1][10]. - The release of "Nezha 2" has further boosted the popularity of domestic IPs, highlighting significant opportunities in the IP monetization value chain, including trendy toys and film content [1]. - The publishing and gaming sectors are expected to benefit from tax relief policies, with the publishing industry projected to see high growth in 2025 [1]. Summary by Sections Market Overview - The media sector's performance was notably poor, ranking among the bottom three sectors, with a decline of 8.06% [10]. - The top-performing sectors included steel, building materials, and real estate, while the computer and communication sectors also faced significant declines [10]. Subsector Insights - Key focus areas include: 1. Resource integration expectations: Companies like China Vision Media, Guoxin Culture, and others are highlighted [2]. 2. AI applications: Companies such as Aofei Entertainment and Tom Cat are noted for their potential [2]. 3. Gaming: Strong recommendations for companies like Shenzhou Taiyue and Kaixin Network [2]. 4. State-owned enterprises: Companies like Ciweng Media and Anhui New Media are emphasized [2]. 5. Education: Companies like Xueda Education and Action Education are mentioned [2]. 6. Hong Kong stocks: Notable mentions include Tencent Holdings and Pop Mart [2]. Key Events Review - The release of GPT-4.5 by OpenAI, which boasts over ten times the computational efficiency of GPT-4, is a significant development in AI technology [21]. - DeepSeek's open-source initiatives, including the release of various codebases, are aimed at enhancing data access and model training efficiency [21]. - Alibaba's launch of the video generation model Wan 2.1 showcases advancements in video technology, particularly in generating synchronized movements and text within videos [21]. Subsector Data Tracking - The gaming sector is seeing a variety of new game releases, with popular titles currently available for pre-order [23]. - The domestic film market's total box office for the week was approximately 431 million yuan, with "Nezha: The Devil's Child" leading the box office [24][26]. - The top-rated series and variety shows reflect strong viewer engagement, with "Difficult to Please" and "Mars Intelligence Agency Season 7" leading in viewership [27][28].
DeepSeek披露,一天成本利润率为545%
华尔街见闻· 2025-03-01 11:17
Core Viewpoint - DeepSeek has disclosed key information regarding its model inference cost and profit margins, claiming a theoretical daily profit margin of 545% based on specific assumptions about GPU rental costs and token pricing [1][3]. Financial Performance - DeepSeek's total cost is reported to be $87,072 per day, assuming a GPU rental cost of $2 per hour. The theoretical total revenue from all tokens, calculated at DeepSeek-R1 pricing, amounts to $562,027 per day, leading to a profit margin of 545% [1][3]. - The pricing structure for DeepSeek-R1 includes $0.14 per million input tokens (cache hit), $0.55 per million input tokens (cache miss), and $2.19 per million output tokens [3]. Market Reactions - The article prompted significant discussion online, particularly regarding the profitability of DeepSeek's API services, with founder You Yang previously stating a monthly loss of 400 million yuan [2][5]. - You Yang indicated that the current state of DeepSeek API (MaaS) is not profitable due to discrepancies between testing speeds and real-world scenarios, as well as machine utilization issues [5]. Operational Insights - DeepSeek aims to optimize its V3/R1 inference systems for higher throughput and lower latency, focusing on techniques such as increasing batch size and load balancing [4]. - The company operates with a strategy of deploying full nodes during peak hours and releasing nodes for training during off-peak hours, which is seen as a response to concerns about resource utilization [5]. Open Source Initiatives - DeepSeek recently concluded a "Open Source Week," during which it announced the release of several codebases, including Fire-Flyer file system and other frameworks aimed at enhancing data processing capabilities [7][8][9][10][11]. - The cumulative downloads of the DeepSeek App have surpassed 110 million, with peak weekly active users reaching nearly 97 million, indicating strong user engagement [12].
DeepSeek宣布:活动正式收官
21世纪经济报道· 2025-02-28 08:46
Core Insights - DeepSeek's "Open Source Week" has successfully concluded, showcasing its commitment to transparency and collaboration in the AI field [1][7]. Group 1: Open Source Projects - The "Open Source Week" launched five projects from February 24 to February 28, covering various aspects of computing, communication, and storage [3]. - On February 24, the first open-source library, FlashMLA, was released, optimized for Hopper GPU, focusing on variable-length sequences and is now in production [4]. - On February 25, DeepEP was announced for public access, designed for MoE model training and inference, enabling efficient all-to-all communication and supporting low-precision operations [4]. - On February 26, DeepGEMM was open-sourced, a library for FP8 general matrix multiplication, featuring fine-grained scaling and supporting both standard and MoE group GEMM [5]. - On February 27, two tools (DualPipe and EPLB) and a performance analysis dataset were released, along with detailed explanations of parallel computing optimization techniques [5]. - On February 28, the release of 3FS was announced, which serves as an accelerator for all DeepSeek data access [6]. Group 2: API and Pricing Adjustments - DeepSeek reopened its API recharge function on February 25 after a 19-day suspension, accompanied by a structural adjustment in pricing [9]. - The pricing for the DeepSeek-chat based on the V3 model is set at 2 yuan per million input tokens and 8 yuan per million output tokens, while the DeepSeek-reasoner based on the R1 model is priced at 4 yuan per million input tokens and 16 yuan per million output tokens [9]. - On February 26, a peak-shifting discount pricing strategy was introduced, with significant reductions during specific hours, offering V3 at 50% off and R1 at 25% off [10]. Group 3: Market Impact - According to CITIC Securities, DeepSeek's open-source initiatives are expected to catalyze the AI+ theme, enhancing AI penetration across various industries and increasing demand for computing power [7].
刚刚!DeepSeek,硬核发布!
券商中国· 2025-02-27 03:35
Core Viewpoint - DeepSeek has made significant advancements in optimizing parallel computing strategies and has introduced new models that enhance performance and reduce costs in AI applications [2][3][5][7]. Group 1: Optimized Parallelism Strategies - DeepSeek announced the release of Optimized Parallelism Strategies aimed at improving computational efficiency, reducing resource waste, and maximizing system performance through effective task allocation and resource coordination [3][5]. - The strategies are designed for high-performance parallel execution in multi-core, distributed, or heterogeneous systems, balancing computation, communication, and storage overhead [5]. Group 2: New Model Releases - NVIDIA has open-sourced the first optimized DeepSeek-R1 model on the Blackwell architecture, achieving a 25-fold increase in inference speed and a 20-fold reduction in cost per token [3][7]. - The DeepSeek-R1 model's local deployment has garnered significant attention, with its inference throughput reaching 21,088 tokens per second, compared to 844 tokens per second for the H100, marking a substantial performance improvement [7]. Group 3: Cost Reduction Initiatives - DeepSeek announced a significant reduction in API call prices during nighttime hours, with DeepSeek-V3 prices cut to 50% and DeepSeek-R1 to as low as 25%, with reductions up to 75% [6]. Group 4: Additional Open Source Contributions - DeepSeek has continued its open-source initiatives by releasing FlashMLA, DeepEP, and DeepGEMM, which are optimized for NVIDIA GPUs and designed to support various AI model training and inference tasks [9].
陆家嘴财经早餐2025年2月27日星期四
Wind万得· 2025-02-26 22:44
Group 1 - DeepSeek announced the opening of the DeepGEMM code library, designed for efficient FP8 general matrix multiplication, with significant discounts on API calls during off-peak hours [2] - The China Securities Finance Corporation will no longer disclose margin financing and securities lending data, which will now be available on the China Securities Data Company's website [2] - China International Capital Corporation and China Galaxy Securities responded to merger rumors, stating they have not received any information regarding a merger from government departments or shareholders [3] Group 2 - The U.S. stock market showed mixed results, with the Dow Jones down 0.43% and the S&P 500 up 0.01%, while major Chinese concept stocks saw significant gains [5] - European stock indices closed higher, with Germany's DAX up 1.71% and France's CAC40 up 1.15% [5] - The Asia-Pacific stock indices had varied performances, with Japan's Nikkei 225 down 0.25% and New Zealand's S&P 50 up 1.18% [5] Group 3 - The People's Bank of China conducted a 548.7 billion yuan reverse repurchase operation at a rate of 1.5%, resulting in a net injection of 9.8 billion yuan [9] - The Hong Kong government projected a budget deficit of 87.2 billion HKD for the 2024/2025 fiscal year, with an expected economic growth rate of 2%-3% for 2025 [9] - Analysts predict a 5% economic growth target for this year's National People's Congress, with a focus on proactive fiscal policies [10] Group 4 - The A-share market saw a rise in major indices, with the Shanghai Composite Index up 1.02% and significant gains in the steel and robotics sectors [12] - The Hong Kong market experienced a surge, with the Hang Seng Index up 3.27% and notable increases in technology and real estate stocks [12] - The China Securities Regulatory Commission emphasized support for Chongqing's economic and financial development [13] Group 5 - The Hong Kong government announced reforms to enhance the stock market, including the introduction of a "Tech Enterprise Line" and adjustments to dual primary listing requirements [14] - Xiaomi launched its end-to-end smart driving system and announced the release of the Xiaomi 15 Ultra smartphone with a 6000mAh battery [15][16] - The IPO of Mixue Ice Cream set a new record in Hong Kong with a subscription rate of 5125 times, raising 1.77 trillion HKD [16] Group 6 - The National Financial Regulatory Administration announced that from March 1, 2025, financial institutions in Hong Kong and Macau will no longer be required to meet a total asset threshold of 2 billion USD to invest in insurance companies [18] - The China Securities Association reported that 137 asset-backed securities plans were newly registered in January 2025, totaling 112.26 billion yuan [18] Group 7 - The Ministry of Industry and Information Technology initiated a pilot program for public sector vehicles in ten cities, aiming to promote the use of new energy vehicles [22] - The Hong Kong government plans to allocate 1 billion HKD to establish an artificial intelligence research institute [22] Group 8 - The U.S. Senate confirmed Jamieson Greer as the U.S. Trade Representative [24] - The German Gfk consumer confidence index for March was reported at -24.7, below expectations [24] - The Thai central bank unexpectedly cut its policy rate by 25 basis points to 2.00% [25]
【太平洋电子-每日观点&资讯】(2025-02-27)
远峰电子· 2025-02-26 13:03
Market Performance - The main board saw significant gains with companies like Kosen Technology (+10.04%), Heertai (+10.01%), and Taijing Technology (+5.86%) leading the charge [1] - The STAR Market also performed well, highlighted by Canxin Technology (+20.00%) and Chip Origin Technology (+13.24%) [1] - Active sub-industries included SW Digital Chip Design (+3.23%) and SW Semiconductor Materials (+1.41%) [1] Domestic News - Xiaomi announced the launch of its 15 Ultra smartphone featuring a 6000mAh battery with a 10% higher silicon content, claiming it to be the "strongest Xiaomi battery" with a daily usage endurance of 1.46 days and over 1000 effective charge cycles [1] - Xiaomi's founder Lei Jun announced the full-scale rollout of Xiaomi HAD, an end-to-end intelligent driving solution, with the Xiaomi SU7 Ultra pre-installed [1] - DeepSeek has open-sourced DeepGEMM, a library designed for efficient FP8 general matrix multiplication, which is implemented in CUDA and does not require compilation during installation [1] - Nullmax and Renesas Electronics signed a strategic cooperation agreement to develop reliable and user-friendly intelligent driving solutions by combining their respective strengths in AI software algorithms and chip performance [1] Company Announcements - Fudan Microelectronics reported a revenue of 3.59 billion yuan for 2024, a year-on-year increase of 1.51%, with a net profit of 573 million yuan [3] - Qingyi Optoelectronics announced a revenue of 1.11 billion yuan for 2024, up 20.35%, with a net profit of 172 million yuan, reflecting a 28.80% increase [3] - Juguang Technology reported a revenue of 619 million yuan for 2024, a year-on-year growth of 10.32% [3] - Yinjixin announced a revenue of 1.43 billion yuan for 2024, a 17.53% increase, with a net profit of 124 million yuan, marking a significant growth of 322.73% [3] Overseas News - Micron has begun shipping samples of its next-generation 1γ (1-gamma) 10nm DDR5 memory, achieving a data transfer rate of 9200MT/s, a 15% increase over previous products, while reducing power consumption by over 20% [3] - Canalys reported a strong performance in global AI computer shipments, reaching 15.4 million units in Q4 2024, accounting for 23% of total computer shipments, with Apple holding a dominant market share of 54% [3] - TrendForce indicated that global smartphone panel shipments reached 2.157 billion units in 2024, a year-on-year increase of 11.4%, attributed to rising sales of new smartphone models and demand for second-hand and refurbished devices [3] - Meta is discussing a $200 billion AI data center project [3]