Workflow
CloudMatrix
icon
Search documents
通信ETF(515880)涨超5.6%,软硬协同技术革新或成行业新动能
Mei Ri Jing Ji Xin Wen· 2025-08-13 03:17
国泰海通指出,华为正通过软硬协同构建全栈AI竞争力,通信设备行业迎来技术革新。华为AI战 略从对标SOTA模型转向为昇腾硬件量身定制架构,推出Pangu Pro MoE和Pangu Ultra MoE两大创新路 径,分别通过分组专家混合(MoGE)架构和系统级优化解决负载不均衡问题,提升硬件效率。新一代 AI基础设施CloudMatrix采用统一总线(UB)网络,构建分布式高速内存池,降低跨节点通信差异,支 持PDC分离架构和大规模专家并行(LEP)。随着大模型从稠密转向MoE稀疏架构,华为聚焦分布式系 统效率难题,将软硬协同创新拓展至AI系统工程领域,通信设备行业的技术体系正向全栈协同方向深 度演进。 注:如提及个股仅供参考,不代表投资建议。指数/基金短期涨跌幅及历史表现仅供分析参考,不 预示未来表现。市场观点随市场环境变化而变动,不构成任何投资建议或承诺。文中提及指数仅供参 考,不构成任何投资建议,也不构成对基金业绩的预测和保证。如需购买相关基金产品,请选择与风险 等级相匹配的产品。基金有风险,投资需谨慎。 每日经济新闻 (责任编辑:董萍萍 ) 【免责声明】本文仅代表作者本人观点,与和讯网无关。和讯网站对 ...
国泰海通|产业:华为盘古大模型与昇腾AI计算平台,共同构建软硬一体的AI技术体系
华为正通过从大模型设计到基础设施的软硬协同,探索构建其全栈 AI 竞争力的路径。 华为 AI 发展策略已逐渐从追赶并对标业界 SOTA 模型,转向为更好地 发挥自研昇腾硬件潜力而量身定做模型架构。这一双向协同进化路径,旨在解决 AI 模型规模化应用中的系统性问题,并构建一个由软硬件协同架构、算子与 软件栈构成的全栈技术体系。 盘古大模型的演进,其核心是为解决大规模分布式系统中的效率难题。 随着大语言模型从稠密架构全面转向混合专家( MoE )稀疏架构,业界普遍面临专家 负载不均衡这一系统性瓶颈,它制约了 MoE 模型在训练和推理中的实际性能。华为将此系统性问题作为其软硬架构创新的核心方向,标志着其关注点已从单 纯硬件或单纯 AI 算法问题,拓展至在自研硬件上更高效解决 AI 系统工程问题。 华为在大模型层面并行推出了两种创新路径。一方面, Pangu Pro MoE 通过架构破局,提出分组专家混合( MoGE )架构,旨在通过结构性设计解决负载不 均衡问题。另一方面 , Pangu Ultra MoE 则通过系统级优化,以仿真先行的设计方法来优化模型架构从而更好的适配昇腾硬件,并通过贯穿训练和推理的协 同优化 ...
华为盘古大模型与腾AI计算平台,共同构建软硬一体的AI技术体系
产业深度 [table_Header]2025.08.06 【AI 产业深度】华为盘古大模型与昇腾 AI 计算平台,共同构建软硬一体 的 AI 技术体系 产业研究中心 摘要: | [Table_Authors] | 鲍雁辛(分析师) | | --- | --- | | | 0755-23976830 | | | baoyanxin@gtht.com | | 登记编号 | S0880513070005 | | | 李嘉琪(分析师) | | | 010-83939821 | | | lijiaqi2@gtht.com | | 登记编号 | S0880524040001 | | | 刘峰(研究助理) | | | 0755-23976068 | | | liufeng6@gtht.com | | 登记编号 | S0880124060013 | [Table_Report] 往期回顾 通往 L3 智能驾驶与具身智能之钥——视觉-语言- 动作模型(VLA)产业研究 2025.08.02 低空经济系列(八):从 Joby 和 Archer 看国内 eVTOL 产业 2025.07.17 尊界以智能定义豪华,引领汽车产业攀顶 ...
产业深度:【AI产业深度】华为盘古大模型与昇腾AI计算平台,共同构建软硬一体的AI技术体系
Investment Rating - The report does not explicitly state an investment rating for the industry. Core Insights - Huawei is exploring a "soft and hard integration" strategy to enhance its AI competitiveness, transitioning from merely catching up with industry SOTA models to customizing model architectures for its self-developed Ascend hardware [12][30]. - The evolution of the Pangu model series reflects a shift from parameter competition to a focus on efficiency and scalability, culminating in the adoption of the Mixture of Experts (MoE) architecture [12][30]. - The report highlights the introduction of innovative architectures like Pangu Pro MoE and Pangu Ultra MoE, which aim to maximize the utilization of Ascend hardware through structural and system-level optimizations [36][62]. Summary by Sections 1. Evolution of Pangu Models - The Pangu model series began with PanGu-α, a 200 billion parameter model, which established a technical route based on Ascend hardware [12][30]. - PanGu-Σ, launched in 2023, marked an early attempt at sparsification, exploring trillion-parameter models with a focus on efficiency [15][18]. - Pangu 3.0 introduced a "5+N+X" architecture aimed at deep industry applications, showcasing its capabilities in various sectors [22][23]. 2. Pangu Pro MoE and Pangu Ultra MoE - Pangu Pro MoE addresses the challenge of expert load imbalance in distributed systems through a new architecture called Mixture of Grouped Experts (MoGE) [36][37]. - The MoGE architecture ensures load balancing by structuring the selection of experts, thus enhancing efficiency in distributed deployments [45][46]. - Pangu Ultra MoE emphasizes system-level optimization strategies to explore the synergy between software and hardware, reflecting a practical application of the soft and hard integration concept [62]. 3. CloudMatrix Infrastructure - CloudMatrix serves as the physical foundation for AI infrastructure, enabling high-performance communication and memory management across distributed systems [5][10]. - The infrastructure supports the Pangu models by providing a unified addressing distributed memory pool, which reduces performance discrepancies in cross-node communication [5][10]. 4. Full-Stack Collaboration - Huawei's AI strategy is centered around full-stack collaboration, integrating open-source strategies to build an ecosystem around Ascend hardware [10][12]. - The architecture, systems, and operators form the three pillars of this full-stack collaboration, aimed at enhancing the overall efficiency and effectiveness of AI solutions [10][12].
深度|黄仁勋:人形机器人或成下个万亿产业,华为的技术可能已相当于H200
Z Potentials· 2025-06-14 03:58
Core Insights - Nvidia's CEO Jensen Huang emphasizes the company's resilience in the face of challenges in the Chinese market, highlighting a strategic pivot towards AI inference demand which has exceeded expectations [3][10] - The company is experiencing significant growth in AI services, with products like ChatGPT and Gemini driving demand for AI inference capabilities [3][10] - Huang acknowledges the importance of the Chinese market, despite current revenue losses, and stresses the need for Nvidia to remain competitive against local players like Huawei [5][7] Market Performance - Nvidia reported a second-quarter revenue of $45 billion, with a 2% fluctuation, and an estimated loss of $8 billion related to the Chinese market and H20 chip sales [3] - The introduction of new architectures like Blackwell and Fei-Lung 72 is seen as a breakthrough, contributing to Nvidia's strong market position [4] Strategic Adjustments - Huang discusses the challenges posed by strict regulations in China, indicating that the H20 chip has reached its minimum specifications, and future designs will need to create market value [6] - The company is exploring ways to maintain competitiveness in the face of rapid advancements by Chinese competitors [6][7] Competitive Landscape - Huawei's technology is reportedly on par with Nvidia's H200, and their new CloudMatrix system is noted for its scalability [7][10] - Huang points out that Chinese companies are increasingly turning to Huawei due to trust issues with American technology, highlighting a shift in the competitive dynamics [7][8] Political and Economic Context - Huang expresses support for Trump's policies, particularly regarding tariffs and AI diffusion rules, which he believes will benefit American manufacturing and technology adoption globally [11][12] - The company is actively working on establishing manufacturing facilities in the U.S. and encouraging global partnerships to enhance AI infrastructure [10][11] Future Prospects - Nvidia is collaborating with Tesla and xAI on various projects, including the development of humanoid robots, which Huang believes could lead to a trillion-dollar industry [13] - The company is planning to engage with European nations to promote AI infrastructure and factory development, recognizing AI as a critical component of national infrastructure [14]
昇腾 AI 算力集群有多稳?万卡可用度 98%,秒级恢复故障不用愁
第一财经· 2025-06-10 11:25
Core Viewpoint - The article emphasizes the importance of high availability in AI computing clusters, likening them to a "digital engine" that must operate continuously without interruptions to support business innovation and efficiency [1][12]. Group 1: High Availability and Fault Management - AI computing clusters face complex fault localization challenges due to their large scale and intricate technology stack, with current fault diagnosis taking from hours to days [2]. - Huawei's team has developed a comprehensive observability capability to enhance fault detection and management, which includes cluster operation views, alarm views, and network link monitoring [2][12]. - The average AI cluster experiences multiple faults daily, significantly impacting training efficiency and wasting computing resources [2]. Group 2: Reliability and Performance Enhancements - Huawei's reliability analysis model aims to improve the mean time between failures (MTBF) for large-scale clusters to over 24 hours [3]. - The introduction of a multi-layer protection system and software fault tolerance solutions has achieved a fault tolerance rate of over 99% for optical modules [3]. - Training efficiency has been enhanced, with linearity metrics showing 96% for dense models and 95.05% for sparse models under specific configurations [6]. Group 3: Fast Recovery Mechanisms - Huawei has implemented a multi-tiered fault recovery system that significantly reduces training recovery times to under 10 minutes, with process-level recovery achieving as low as 30 seconds [9][10]. - The introduction of instance-level recovery techniques has compressed recovery times to under 5 minutes, minimizing user impact during faults [10]. Group 4: Future Directions and Innovations - Huawei's six innovative solutions for high availability include fault perception and diagnosis, fault management, and optical link fault tolerance, which have led to a cluster availability rate of 98% [12]. - Future explorations will focus on diverse application scenarios, heterogeneous integration, and intelligent autonomous maintenance to drive further innovations in AI computing clusters [12].