Workflow
MindSpore
icon
Search documents
以「图」破局,HyperOffload定义超节点存储管理新范式
机器之心· 2026-03-16 03:53
Core Viewpoint - The article discusses the challenges and solutions related to the deployment of large language models (LLMs) in the era of trillion-parameter AI, particularly focusing on the "memory wall" issue and the innovative HyperOffload technology developed by Shanghai Jiao Tong University and Huawei MindSpore team [2][19]. Group 1: HyperOffload Technology - HyperOffload introduces a "graph-driven" hierarchical memory management system that significantly enhances the efficiency of heterogeneous resource collaboration within supernode architectures [5][11]. - The core technology of HyperOffload has been integrated into Huawei's AI framework MindSpore version 2.8, enabling one-click acceleration deployment for trillion-parameter models [5][19]. Group 2: Memory Management Innovations - The technology employs a Hierarchical Memory Manager (HMM) to transform physically isolated storage media into a logical "resource pooling" view, specifically designed for supernodes with HBM, DDR, and Flash [11]. - Selective parameter offloading is implemented using a multi-dimensional cost model that scores tensors based on access frequency, recomputation costs, and communication bandwidth loss, ensuring that core operators remain in high-speed HBM while background data is efficiently managed in DDR [12][13]. Group 3: Enhanced Resource Pooling - HyperOffload extends beyond weight offloading to manage the entire inference process, including KV Cache, intermediate activation values, and optimizer states, creating a unified logical view that seamlessly integrates massive tensors across different media [13]. - The combination of selective parameter offloading and adaptive activation value swapping allows large-scale models to run smoothly on hardware clusters with limited memory, ensuring uninterrupted training and inference operations [13][14]. Group 4: Advanced Scheduling and Communication - HyperOffload shifts from passive scheduling to global planning through a compilation-driven graphical management strategy, enhancing resource management and reducing memory fragmentation [16]. - The system achieves deep overlap of computing power and bandwidth, enabling "invisible communication" that conceals data migration costs within the execution cycle of computational tasks, significantly improving overall computational efficiency [17]. Group 5: Collaboration and Future Prospects - The release of HyperOffload marks a new phase in the collaboration between Shanghai Jiao Tong University and Huawei MindSpore in the AI infrastructure field, with the solution already implemented in several large-scale commercial projects [19]. - Future efforts will focus on further optimizing performance under supernode architectures and building a more flexible end-to-end inference framework to support the large-scale application of generative AI [20].
华为超节点赶超英伟达:驾驭“光”很关键
Guan Cha Zhe Wang· 2026-02-10 03:20
Core Insights - The emergence of SuperPods as a new AI computing infrastructure has become a focal point in the industry since 2025, with Huawei's Ascend 384 SuperPod leading the way in performance metrics compared to foreign competitors [1][3] - The demand for computing power is far from being met, with token consumption expected to exceed trillions daily in China, highlighting the inadequacy of simply stacking servers to address the computing gap [3][4] Group 1: SuperPod Characteristics - SuperPods are not merely about stacking chips; they represent a fundamental restructuring of traditional computing architectures, enabling equal interconnectivity among CPUs, NPUs, and memory units [4][6] - Key features of a true SuperPod include high bandwidth to eliminate communication delays, low latency, and the ability to form a logically unified system through unified memory addressing [6][7] Group 2: Efficiency and Performance - SuperPods can significantly enhance computing efficiency, with potential model utilization rates increasing from 30% to 45%, effectively a 50% improvement, which can help mitigate the limitations of chip manufacturing processes [7][8] - The architecture of SuperPods differs from traditional systems, as Huawei employs optical communication technology, allowing for greater scalability and interconnectivity compared to NVIDIA's copper-based systems [8][9] Group 3: Innovation and Ecosystem - Huawei's systematic innovation in chip design, optical components, and foundational protocols has positioned it uniquely in the market, leveraging over 20 years of experience in optical technology [9][12] - The company is also developing general computing SuperPods, with the TaiShan 950 SuperPod set to launch in Q1 2026, aimed at replacing various server applications [11][12] Group 4: Software and Community Engagement - The success of SuperPods relies not only on hardware but also on a robust software ecosystem, including open-source initiatives like CANN and openEuler, which are crucial for fostering industry collaboration [14] - Huawei has engaged a large developer community, with 3.8 million registered developers for Kunpeng and nearly 4 million for Ascend, emphasizing the importance of open-source collaboration in the AI era [14]
华为打造“最强超节点”,这项全球领先技术很关键
Guan Cha Zhe Wang· 2026-02-10 03:10
Core Viewpoint - The emergence of SuperPod as a new AI computing infrastructure has become a focal point in the industry since 2025, with Huawei's Ascend 384 SuperPod leading the way in performance metrics compared to foreign competitors [1][3]. Group 1: SuperPod Concept and Advantages - SuperPod is not merely about stacking chips; it represents a fundamental restructuring of traditional computing architecture, enhancing communication efficiency among CPU, NPU, and memory units [4][6]. - The key advantages of SuperPod over traditional clusters include significantly improved computational efficiency, with potential model computing utilization rates increasing from 30% to 45%, equating to a 50% performance boost [7][8]. Group 2: Technical Challenges and Innovations - Building a true SuperPod is complex; Huawei's Ascend 384 SuperPod consists of 12 computing cabinets and 4 bus cabinets, while NVIDIA's NVL72 system is confined to a single cabinet due to architectural differences [8]. - Huawei employs optical communication technology for interconnection, allowing for greater scalability beyond single cabinet limitations, while traditional systems face constraints with electrical signal transmission [8][9]. Group 3: Systematic Innovation and Ecosystem Development - Huawei's systematic innovation includes proprietary chip development, optical device capabilities, and foundational protocols, enabling the creation of SuperPods that leverage full optical interconnectivity [9][12]. - The company is also developing general computing SuperPods, such as the TaiShan 950, which aims to replace various server applications by 2026 [9][11]. - A robust software ecosystem, including open-source initiatives like CANN and openEuler, is essential for the operation of SuperPods, with a focus on collaborative development within the industry [14].
当开放成为共识,创新的边界正在被重新定义
Sou Hu Cai Jing· 2025-11-19 13:05
Core Insights - The core theme of the forum is "Open Drives Innovation," emphasizing the shift from competition to collaboration in the realm of intellectual property and innovation [1][10][12] Group 1: Forum Overview - The sixth Huawei Innovation and Intellectual Property Forum gathered representatives from various international organizations and companies to discuss the role of intellectual property in fostering collaborative innovation [1][2] - Huawei's Chief Legal Officer, Song Liuping, highlighted that the essence of intellectual property is not exclusivity but rather the orderly and efficient dissemination of innovative results [2][4] Group 2: Huawei's Contributions - As of the end of 2024, Huawei has obtained over 150,000 valid patents globally, with R&D investment exceeding 20% of annual revenue, totaling over 1.2 trillion yuan in the past decade [4][7] - The forum showcased Huawei's "Top Ten Inventions" for 2024, which include significant technological advancements such as the Scale Up ultra-large-scale computing platform and the HarmonyOS full-stack architecture [4][6] Group 3: Knowledge Sharing and Collaboration - The upgraded "Chasi Patents" platform was introduced, enhancing patent search and analysis capabilities, thereby accelerating knowledge flow and innovation [6][9] - Huawei's commitment to open innovation is reflected in its extensive participation in global ICT standards and its collaboration with international licensing platforms, generating over $630 million in patent licensing revenue in 2024 [8][9] Group 4: Open Innovation Strategy - Huawei's strategy emphasizes that knowledge sharing enhances social value rather than diminishing rights, promoting a cycle where patent protection leads to commercial returns that fund further R&D [9][10] - The forum underscored the importance of establishing standardized interfaces and shared platforms to make innovation more efficient and inclusive [10][12] Group 5: Future Implications - The discussions at the forum suggest that open innovation is becoming a dominant theme in global technological collaboration, with the potential to significantly enhance cross-industry integration [10][12] - The evolving landscape of technology, including AI and quantum computing, presents challenges in establishing sustainable cooperation mechanisms, highlighting the need for intellectual property to serve as a bridge rather than a barrier [10][12]
徐直军详解华为最强“算力核弹”
Guan Cha Zhe Wang· 2025-09-18 13:24
Core Insights - Huawei unexpectedly revealed its future chip roadmap during the Huawei Connect 2025 event, showcasing several new chips including the Ascend 950, 960, and 970 series for AI computing, as well as the Kunpeng 950 and 960 processors for general computing [1][3][10] Group 1: Chip Developments - The Ascend 950 series chips will support low-precision data formats and achieve computing power of 1P and 2P, enhancing training efficiency and inference throughput [3][10] - The Ascend 960 is planned to double the performance of the Ascend 950 and will support Huawei's self-developed HiF4 data format, set to launch in Q4 2027 [7] - The Ascend 970 will further enhance specifications compared to the Ascend 960, with plans for release in Q4 2028 [7] Group 2: Supernode and Cluster Innovations - Huawei introduced the Atlas 950 supernode, which will consist of 8192 Ascend 950DT chips, achieving FP8 computing power of 8E FLOPS and FP4 computing power of 16E FLOPS, set to launch in Q4 2026 [11][13] - The Atlas 960 supernode, planned for Q4 2027, will be based on 15488 Ascend 960 chips, with FP8 computing power reaching 30E FLOPS and FP4 computing power reaching 60E FLOPS [13] - The Atlas 950 SuperCluster will consist of 64 Atlas 950 supernodes, achieving FP8 computing power of 524 EFLOPS, making it the world's strongest computing cluster [18] Group 3: Software and Ecosystem Development - Huawei aims to develop a robust software ecosystem to complement its hardware, with the CANN deep learning framework and MindSpore framework serving as alternatives to NVIDIA's CUDA [21][22] - The company plans to open-source its CANN compiler and virtual instruction set interface by the end of 2025, along with the Mind series application tools [22][24] - Huawei's strategy emphasizes hardware evolution through existing chip technology while fostering an open-source ecosystem to address challenges posed by U.S. sanctions [24]
AI落地的关键堵点,华为用“黑科技”打通了
Guan Cha Zhe Wang· 2025-08-15 04:06
Core Viewpoint - The traditional Scaling Law for AI models is facing significant bottlenecks, particularly in China, where infrastructure investment is lagging behind the US, leading to challenges in AI inference performance and commercial viability [1][4][9]. Group 1: AI Inference Challenges - AI inference has become a critical area, with current demand for inference computing power exceeding that for training, as evidenced by GPT-5's API call volume exceeding 20 billion calls per minute [4][6]. - Chinese enterprises face a "push not moving," "push slow," and "push expensive" dilemma, with domestic models outputting less than 60 tokens per second compared to over 200 tokens per second for foreign models [7][9]. - The increasing complexity of AI applications, such as long text processing and multi-turn dialogues, has intensified the demand for improved inference performance [1][4][6]. Group 2: Huawei's UCM Technology - Huawei has introduced the Unified Cache Manager (UCM), a breakthrough technology designed to enhance AI inference performance by optimizing memory management and overcoming HBM capacity limitations [1][11]. - UCM employs a tiered caching strategy that allows for the efficient storage and retrieval of KV Cache data, significantly reducing inference latency and costs [10][11][18]. - The technology has demonstrated substantial improvements in inference speed, with a reported 125-fold increase in processing speed for specific applications in collaboration with China UnionPay [19][21]. Group 3: Industry Implications and Future Prospects - The introduction of UCM is seen as a pivotal move for the Chinese AI industry, potentially leading to a positive cycle of user growth, increased investment, and rapid technological iteration [18][24]. - Huawei's open-source approach to UCM aims to foster collaboration within the AI ecosystem, allowing various stakeholders to integrate and enhance their frameworks [28]. - The technology is expected to be applicable across various industries, addressing the challenges posed by the increasing volume of data and the need for efficient inference solutions [23][24].
华为版CUDA,全面开源了
猿大侠· 2025-08-07 04:11
Core Viewpoint - Huawei has announced the open-source of its Ascend AI GPU software toolkit, CANN, aiming to enhance its competitiveness against NVIDIA's CUDA ecosystem [1][3][7]. Group 1: Huawei's AI Strategy - Huawei's rotating chairman, Xu Zhijun, emphasized that the core of Huawei's AI strategy is computing power, with a focus on monetizing Ascend hardware [3]. - The open-sourcing of CANN and the Mind series application enablement suite is intended to accelerate innovation among developers, making Ascend easier to use [3][12]. Group 2: CANN Overview - CANN, a neural network computing architecture, provides multiple programming interfaces to help users build AI applications for Huawei's Ascend [4]. - CANN is described as Huawei's version of CUDA, offering a similar interface for GPU support [5]. - The latest version of CANN has been upgraded to 8.0, with both community and commercial versions available, supporting 12 operating systems [7]. Group 3: Competitive Landscape - The announcement coincided with the emergence of a new startup, Oxmiq Labs, founded by a legendary GPU architect, which aims to create a software ecosystem similar to CUDA [6][13]. - Oxmiq Labs focuses on developing GPU hardware and software IP, providing a vertically integrated platform for AI and graphics workloads [20][22]. - The software stack from Oxmiq is designed to be hardware-agnostic, allowing CUDA-based applications to run on non-NVIDIA hardware without modification [29][31]. Group 4: Developer Support - CANN currently supports various deep learning frameworks, including PyTorch, MindSpore, TensorFlow, and others, enhancing its utility for developers [15]. - The open-source initiative is expected to benefit developers by providing more options and reducing dependency on NVIDIA's ecosystem [32].
华为版CUDA,全面开源了
3 6 Ke· 2025-08-06 08:29
Core Insights - Huawei has announced the open-sourcing of its CANN software toolkit for Ascend AI GPUs, emphasizing its AI strategy centered around computing power and monetization of Ascend hardware [3][6][11] - The CANN architecture, akin to NVIDIA's CUDA, provides a multi-layer programming interface to help users build AI applications specifically for Huawei's Ascend GPUs [4][6] Group 1: Huawei's CANN Open-Sourcing - The CANN toolkit has been upgraded to version 8.0, offering both a community version for early feature access and a commercial version tailored for enterprise users [6] - CANN now supports various deep learning frameworks, including PyTorch, MindSpore, TensorFlow, and others, enhancing its compatibility and usability for developers [9] - Huawei has initiated the "CANN Open Source Ecosystem Co-Building Initiative," indicating a strong commitment to developing an open ecosystem around Ascend technology [11] Group 2: Competitive Landscape - A new startup, Oxmiq Labs, founded by a legendary GPU architect, aims to create a software ecosystem similar to CUDA, focusing on GPU hardware and software IP [12][14] - Oxmiq's software stack includes OXCapsule for workload management and OXPython, which allows CUDA-based applications to run on non-NVIDIA hardware without modification [21][23] - The competitive environment is intensifying, with multiple players challenging NVIDIA's dominance in the GPU market, ultimately benefiting developers through increased options and innovation [7][23]
对标英伟达CUDA,华为昇腾关键套件全面开源
Xuan Gu Bao· 2025-08-05 23:22
Group 1 - Huawei announced the full open-source of Ascend hardware enabling CANN, along with the Mind series application enabling suite and toolchain, aimed at accelerating innovation among developers [1] - CANN is positioned against NVIDIA's CUDA, providing a high-performance operator library and various development methods, while the MindSpore open-source framework supports efficient development [1] - The "2+1+X" model of MindX is designed to lower industry development barriers, promoting Ascend computing as a comprehensive AI infrastructure [1] Group 2 - The construction of an ecosystem is seen as an inevitable trend, with a need to enhance the compatibility efficiency with CUDA and reduce performance loss [1] - The deep collaboration between domestic frameworks like MindSpore and chips is essential to form a self-sustaining loop of "chip-framework-application" [1] - Ascend's strong order fulfillment capability is expected to drive continuous expansion of the industry chain, with the Ascend 384 showcasing system-level performance advantages [1] Group 3 - Advanced Communication has officially launched the Ascend A800I A2 large model integrated machine [2] - Tuo Wei Information has established a comprehensive strategic partnership with Huawei, focusing on "Kunpeng/Ascend AI + industry large models + Harmony" [3]
对标英伟达CUDA,华为宣布开源CANN
Xin Lang Cai Jing· 2025-08-05 14:29
Core Insights - Huawei announced the full open-source of its Ascend hardware enabling CANN and Mind series application toolkits, aiming to accelerate innovation among developers and enhance usability [1] - The core of Huawei's AI strategy is centered around computing power, with a focus on monetizing Ascend hardware [1] - CANN serves as a bridge between AI training frameworks and Ascend chips, similar to NVIDIA's CUDA, which is a critical component of NVIDIA's competitive advantage [1][3] Group 1: CANN and Its Ecosystem - CANN has been upgraded to version 8.0, introducing over 200 optimized basic operators, 80 fusion operators, and 100 Ascend C APIs, significantly reducing the typical operator development cycle from 2 person-months to 1.5 person-weeks [4] - CANN is gradually becoming compatible with more AI frameworks, currently supporting PyTorch, MindSpore, TensorFlow, PaddlePaddle, ONNX, and others, facilitating easier migration of models and applications [5] - Huawei is committed to a layered deep openness for CANN, providing more SDKs for application development to enhance deployment convenience and efficiency in model training and inference [5] Group 2: Competitive Landscape - Despite the advancements, CANN's usability and ecosystem richness still lag behind NVIDIA's 18-year-developed CUDA ecosystem, indicating a long road ahead for Huawei [7] - Huawei has adopted strategies similar to NVIDIA's initial promotion of CUDA by sending engineering teams to assist major clients in adapting to the CANN environment [7] - The open-sourcing of CANN is a strategic move by Huawei to rapidly expand its ecosystem, with industry leaders and institutions collaborating to build an open-source Ascend ecosystem [7] Group 3: Market Position - Huawei's self-developed AI framework MindSpore has achieved a 30.26% market share in China's AI framework sector, ranking first in 2024, reflecting the company's commitment to open-source initiatives [8] - The company has previously open-sourced other foundational software, countering claims of a closed and monopolistic approach to technology development [8]