MindSpore

Search documents
徐直军详解华为最强“算力核弹”
Guan Cha Zhe Wang· 2025-09-18 13:24
Core Insights - Huawei unexpectedly revealed its future chip roadmap during the Huawei Connect 2025 event, showcasing several new chips including the Ascend 950, 960, and 970 series for AI computing, as well as the Kunpeng 950 and 960 processors for general computing [1][3][10] Group 1: Chip Developments - The Ascend 950 series chips will support low-precision data formats and achieve computing power of 1P and 2P, enhancing training efficiency and inference throughput [3][10] - The Ascend 960 is planned to double the performance of the Ascend 950 and will support Huawei's self-developed HiF4 data format, set to launch in Q4 2027 [7] - The Ascend 970 will further enhance specifications compared to the Ascend 960, with plans for release in Q4 2028 [7] Group 2: Supernode and Cluster Innovations - Huawei introduced the Atlas 950 supernode, which will consist of 8192 Ascend 950DT chips, achieving FP8 computing power of 8E FLOPS and FP4 computing power of 16E FLOPS, set to launch in Q4 2026 [11][13] - The Atlas 960 supernode, planned for Q4 2027, will be based on 15488 Ascend 960 chips, with FP8 computing power reaching 30E FLOPS and FP4 computing power reaching 60E FLOPS [13] - The Atlas 950 SuperCluster will consist of 64 Atlas 950 supernodes, achieving FP8 computing power of 524 EFLOPS, making it the world's strongest computing cluster [18] Group 3: Software and Ecosystem Development - Huawei aims to develop a robust software ecosystem to complement its hardware, with the CANN deep learning framework and MindSpore framework serving as alternatives to NVIDIA's CUDA [21][22] - The company plans to open-source its CANN compiler and virtual instruction set interface by the end of 2025, along with the Mind series application tools [22][24] - Huawei's strategy emphasizes hardware evolution through existing chip technology while fostering an open-source ecosystem to address challenges posed by U.S. sanctions [24]
AI落地的关键堵点,华为用“黑科技”打通了
Guan Cha Zhe Wang· 2025-08-15 04:06
Core Viewpoint - The traditional Scaling Law for AI models is facing significant bottlenecks, particularly in China, where infrastructure investment is lagging behind the US, leading to challenges in AI inference performance and commercial viability [1][4][9]. Group 1: AI Inference Challenges - AI inference has become a critical area, with current demand for inference computing power exceeding that for training, as evidenced by GPT-5's API call volume exceeding 20 billion calls per minute [4][6]. - Chinese enterprises face a "push not moving," "push slow," and "push expensive" dilemma, with domestic models outputting less than 60 tokens per second compared to over 200 tokens per second for foreign models [7][9]. - The increasing complexity of AI applications, such as long text processing and multi-turn dialogues, has intensified the demand for improved inference performance [1][4][6]. Group 2: Huawei's UCM Technology - Huawei has introduced the Unified Cache Manager (UCM), a breakthrough technology designed to enhance AI inference performance by optimizing memory management and overcoming HBM capacity limitations [1][11]. - UCM employs a tiered caching strategy that allows for the efficient storage and retrieval of KV Cache data, significantly reducing inference latency and costs [10][11][18]. - The technology has demonstrated substantial improvements in inference speed, with a reported 125-fold increase in processing speed for specific applications in collaboration with China UnionPay [19][21]. Group 3: Industry Implications and Future Prospects - The introduction of UCM is seen as a pivotal move for the Chinese AI industry, potentially leading to a positive cycle of user growth, increased investment, and rapid technological iteration [18][24]. - Huawei's open-source approach to UCM aims to foster collaboration within the AI ecosystem, allowing various stakeholders to integrate and enhance their frameworks [28]. - The technology is expected to be applicable across various industries, addressing the challenges posed by the increasing volume of data and the need for efficient inference solutions [23][24].
华为版CUDA,全面开源了
猿大侠· 2025-08-07 04:11
Core Viewpoint - Huawei has announced the open-source of its Ascend AI GPU software toolkit, CANN, aiming to enhance its competitiveness against NVIDIA's CUDA ecosystem [1][3][7]. Group 1: Huawei's AI Strategy - Huawei's rotating chairman, Xu Zhijun, emphasized that the core of Huawei's AI strategy is computing power, with a focus on monetizing Ascend hardware [3]. - The open-sourcing of CANN and the Mind series application enablement suite is intended to accelerate innovation among developers, making Ascend easier to use [3][12]. Group 2: CANN Overview - CANN, a neural network computing architecture, provides multiple programming interfaces to help users build AI applications for Huawei's Ascend [4]. - CANN is described as Huawei's version of CUDA, offering a similar interface for GPU support [5]. - The latest version of CANN has been upgraded to 8.0, with both community and commercial versions available, supporting 12 operating systems [7]. Group 3: Competitive Landscape - The announcement coincided with the emergence of a new startup, Oxmiq Labs, founded by a legendary GPU architect, which aims to create a software ecosystem similar to CUDA [6][13]. - Oxmiq Labs focuses on developing GPU hardware and software IP, providing a vertically integrated platform for AI and graphics workloads [20][22]. - The software stack from Oxmiq is designed to be hardware-agnostic, allowing CUDA-based applications to run on non-NVIDIA hardware without modification [29][31]. Group 4: Developer Support - CANN currently supports various deep learning frameworks, including PyTorch, MindSpore, TensorFlow, and others, enhancing its utility for developers [15]. - The open-source initiative is expected to benefit developers by providing more options and reducing dependency on NVIDIA's ecosystem [32].
华为版CUDA,全面开源了
3 6 Ke· 2025-08-06 08:29
Core Insights - Huawei has announced the open-sourcing of its CANN software toolkit for Ascend AI GPUs, emphasizing its AI strategy centered around computing power and monetization of Ascend hardware [3][6][11] - The CANN architecture, akin to NVIDIA's CUDA, provides a multi-layer programming interface to help users build AI applications specifically for Huawei's Ascend GPUs [4][6] Group 1: Huawei's CANN Open-Sourcing - The CANN toolkit has been upgraded to version 8.0, offering both a community version for early feature access and a commercial version tailored for enterprise users [6] - CANN now supports various deep learning frameworks, including PyTorch, MindSpore, TensorFlow, and others, enhancing its compatibility and usability for developers [9] - Huawei has initiated the "CANN Open Source Ecosystem Co-Building Initiative," indicating a strong commitment to developing an open ecosystem around Ascend technology [11] Group 2: Competitive Landscape - A new startup, Oxmiq Labs, founded by a legendary GPU architect, aims to create a software ecosystem similar to CUDA, focusing on GPU hardware and software IP [12][14] - Oxmiq's software stack includes OXCapsule for workload management and OXPython, which allows CUDA-based applications to run on non-NVIDIA hardware without modification [21][23] - The competitive environment is intensifying, with multiple players challenging NVIDIA's dominance in the GPU market, ultimately benefiting developers through increased options and innovation [7][23]
对标英伟达CUDA,华为昇腾关键套件全面开源
Xuan Gu Bao· 2025-08-05 23:22
Group 1 - Huawei announced the full open-source of Ascend hardware enabling CANN, along with the Mind series application enabling suite and toolchain, aimed at accelerating innovation among developers [1] - CANN is positioned against NVIDIA's CUDA, providing a high-performance operator library and various development methods, while the MindSpore open-source framework supports efficient development [1] - The "2+1+X" model of MindX is designed to lower industry development barriers, promoting Ascend computing as a comprehensive AI infrastructure [1] Group 2 - The construction of an ecosystem is seen as an inevitable trend, with a need to enhance the compatibility efficiency with CUDA and reduce performance loss [1] - The deep collaboration between domestic frameworks like MindSpore and chips is essential to form a self-sustaining loop of "chip-framework-application" [1] - Ascend's strong order fulfillment capability is expected to drive continuous expansion of the industry chain, with the Ascend 384 showcasing system-level performance advantages [1] Group 3 - Advanced Communication has officially launched the Ascend A800I A2 large model integrated machine [2] - Tuo Wei Information has established a comprehensive strategic partnership with Huawei, focusing on "Kunpeng/Ascend AI + industry large models + Harmony" [3]
对标英伟达CUDA,华为宣布开源CANN
Xin Lang Cai Jing· 2025-08-05 14:29
Core Insights - Huawei announced the full open-source of its Ascend hardware enabling CANN and Mind series application toolkits, aiming to accelerate innovation among developers and enhance usability [1] - The core of Huawei's AI strategy is centered around computing power, with a focus on monetizing Ascend hardware [1] - CANN serves as a bridge between AI training frameworks and Ascend chips, similar to NVIDIA's CUDA, which is a critical component of NVIDIA's competitive advantage [1][3] Group 1: CANN and Its Ecosystem - CANN has been upgraded to version 8.0, introducing over 200 optimized basic operators, 80 fusion operators, and 100 Ascend C APIs, significantly reducing the typical operator development cycle from 2 person-months to 1.5 person-weeks [4] - CANN is gradually becoming compatible with more AI frameworks, currently supporting PyTorch, MindSpore, TensorFlow, PaddlePaddle, ONNX, and others, facilitating easier migration of models and applications [5] - Huawei is committed to a layered deep openness for CANN, providing more SDKs for application development to enhance deployment convenience and efficiency in model training and inference [5] Group 2: Competitive Landscape - Despite the advancements, CANN's usability and ecosystem richness still lag behind NVIDIA's 18-year-developed CUDA ecosystem, indicating a long road ahead for Huawei [7] - Huawei has adopted strategies similar to NVIDIA's initial promotion of CUDA by sending engineering teams to assist major clients in adapting to the CANN environment [7] - The open-sourcing of CANN is a strategic move by Huawei to rapidly expand its ecosystem, with industry leaders and institutions collaborating to build an open-source Ascend ecosystem [7] Group 3: Market Position - Huawei's self-developed AI framework MindSpore has achieved a 30.26% market share in China's AI framework sector, ranking first in 2024, reflecting the company's commitment to open-source initiatives [8] - The company has previously open-sourced other foundational software, countering claims of a closed and monopolistic approach to technology development [8]
H20解禁,中美AI闭环竞赛开启
Hu Xiu· 2025-07-16 01:51
Group 1 - The H20 chip, previously banned by the US government, is crucial for AI model training in China and is now set to return to the market, indicating a shift in US-China tech relations [3][5][14] - Nvidia's revenue from the H20 chip in 2024 is projected to be between $12 billion and $15 billion, accounting for approximately 85% of its revenue from China [7] - After the ban, Nvidia suffered a loss of about $2.5 billion in sales in the first quarter, with an estimated total loss of $13.5 billion over two quarters [9][10] Group 2 - The return of the H20 chip signifies a tactical compromise in US-China relations, with both sides adjusting their strategies rather than fully decoupling [16][17][25] - Chinese companies have accelerated their development of domestic chips, with firms like Huawei and Alibaba investing in their own technologies to reduce reliance on foreign products [11][22][34] - The Chinese AI market has not stalled due to the H20 ban; instead, it has prompted faster domestic alternatives, potentially threatening Nvidia's market dominance in the future [14][19][51] Group 3 - The H20 chip's return is expected to restore supply chains and reduce costs for companies reliant on Nvidia, allowing AI projects to progress more rapidly [29][30] - The Chinese government is encouraging the use of domestic chips in new data centers, further supporting local technology development [34] - Despite the H20's return, some companies may still prefer Nvidia products due to their established reputation and compatibility, indicating a potential divide in corporate strategies [36][37] Group 4 - Nvidia is likely to focus on enhancing partnerships with leading Chinese AI companies and adapting its offerings to meet local regulatory requirements [43][46] - The competition between US and Chinese tech ecosystems is evolving, with both sides potentially developing parallel AI worlds [52][55] - The establishment of a self-sufficient Chinese AI ecosystem could lead to a significant shift in global tech dynamics, reducing dependence on Western technologies [60][61]
中美AI竞争报告:中国人工智能产业政策能否突破美国封锁?
3 6 Ke· 2025-07-01 07:53
Group 1 - The core objective of China's AI policy is to establish a $100 billion AI industry by 2030, generating over $1 trillion in added value across various sectors [2] - China's AI policies focus on enhancing economic development and national strength, contrasting with the more abstract "general AI race" narrative in the U.S. [2] - The Chinese government is deploying a comprehensive set of policy tools, including an $8.2 billion fund for AI startups and the establishment of national AI laboratories and experimental zones [3] Group 2 - Geopolitical tensions, particularly with the U.S., have led to a shift in China's AI policy towards self-reliance and strategic competition, emphasizing the need for an independent AI ecosystem [6] - Export controls from the U.S. have restricted China's access to advanced computing chips, which are crucial for AI development, prompting Chinese companies to seek alternative strategies [7] - Despite these challenges, the Chinese AI industry is likely to continue progressing, potentially fostering the development of its own semiconductor and software solutions [8] Group 3 - The effectiveness of China's AI policies remains uncertain, but government support is crucial in addressing key bottlenecks such as domestic chip development and talent shortages [9] - The rapid growth of data center energy demands is anticipated, with projections indicating a threefold increase by 2030, which China is likely to meet due to its faster pace of new power plant construction compared to the U.S. [9] - The private sector, particularly innovative tech companies, is expected to drive advancements in AI, with government policies needing to align with private sector needs to be deemed effective [11]
独家揭秘!华为如何让万台AI服务器秒变「超级大脑」
第一财经· 2025-06-09 09:01
Core Viewpoint - The article discusses the advancements in AI computing power clusters, highlighting how they enable the training and inference of large AI models through innovative technologies and fault tolerance mechanisms [1][24]. Group 1: Supernode High Availability - AI training and inference require continuous operation, with each computer in the cluster having a backup to ensure seamless task execution during failures [3][4]. - Huawei's CloudMatrix 384 supernode employs a fault tolerance strategy that includes system-level, business-level, and operational-level fault tolerance to maintain high efficiency [3][4]. Group 2: Cluster Linearity - The ideal scenario for computing power clusters is linear scalability, where 100 computers provide 100 times the power of one [6]. - Huawei's task distribution algorithms ensure that each computer operates efficiently, akin to an orchestra, preventing chaos during large-scale model training [6][8]. Group 3: Rapid Recovery for Large-Scale Training - The system can automatically record training progress, allowing for quick recovery from faults without starting over, significantly reducing downtime [10][11]. - Innovations such as process-level rescheduling and online recovery techniques have been developed to minimize recovery times to under 3 minutes [11][15]. Group 4: Fault Management and Diagnostic Capabilities - A real-time monitoring system continuously checks the health of each computer in the cluster, enabling quick identification and resolution of issues [17]. - Huawei's comprehensive fault management solution includes capabilities for error detection, isolation, and recovery, enhancing overall reliability [17][18]. Group 5: Simulation and Modeling - Before actual training, the computing cluster can simulate scenarios in a "digital wind tunnel" to identify potential bottlenecks and optimize performance [19][20]. - The Markov modeling simulation platform allows for multi-dimensional analysis and performance tuning, ensuring efficient resource allocation [19][20]. Group 6: Framework Migration - Huawei's MindSpore framework supports seamless migration from other frameworks, covering over 90% of PyTorch interfaces, enhancing developer accessibility [22]. - The framework also facilitates quick deployment of large models, improving inference performance through integration with mainstream ecosystems [22]. Group 7: Summary and Outlook - Huawei's innovations address various aspects of computing power clusters, including high availability, linearity, rapid recovery, fault tolerance, diagnostic capabilities, simulation, and framework migration [24]. - The future of computing infrastructure is expected to evolve through a collaborative cycle of application demand, hardware innovation, and engineering feedback, leading to specialized computing solutions [24].
华为昇腾万卡集群揭秘:如何驯服AI算力「巨兽」?
机器之心· 2025-06-09 04:33
Core Viewpoint - The article discusses the advancements in AI computing power clusters, highlighting their critical role in supporting large-scale AI models and ensuring high availability, fault tolerance, and efficient resource management [2][4][39]. Group 1: High Availability of Super Nodes - AI training and inference require continuous operation, similar to an emergency system in hospitals, where each computer in the cluster has a backup to take over in case of failure, ensuring uninterrupted tasks [6][5]. - Huawei's CloudMatrix 384 super node employs a fault tolerance scheme that includes system-level, business-level, and operational-level fault tolerance, transforming faults into manageable issues [7][8]. Group 2: Cluster Linearity - The ideal scenario for computing power clusters is linear scalability, where the total power of 100 computers should be 100 times that of one, achieved through precise task allocation algorithms [10]. - Huawei's team has developed key technologies to enhance training linearity for large models, achieving linearity rates of 96% for the Pangu Ultra 135B model with 4K cards [11][13]. Group 3: Rapid Recovery in Large-Scale Training - When training with thousands of computing units, the system can automatically save progress, allowing for quick recovery from faults without starting over, significantly reducing downtime [14][15]. - Innovations such as process-level rescheduling and online recovery techniques have been introduced to minimize recovery times to under 3 minutes and even 30 seconds for specific faults [16][20]. Group 4: Fault Management and Diagnosis - A real-time monitoring system continuously checks the health of each computer in the cluster, enabling quick identification and resolution of issues before they escalate [24][26]. - Huawei has developed a comprehensive fault management framework that includes capabilities for error detection, isolation, and recovery, enhancing the reliability of the computing infrastructure [24][28]. Group 5: Simulation and Modeling - Before deploying complex AI models, the computing cluster can simulate scenarios in a virtual environment to identify potential bottlenecks and optimize resource allocation [29][30]. - The introduction of a Markov modeling simulation platform allows for multi-dimensional analysis and performance prediction, improving resource efficiency and system stability [30][31]. Group 6: Framework Migration - Huawei's MindSpore framework has rapidly evolved since its open-source launch, providing tools for seamless migration from other frameworks and enhancing performance during training and inference [37][38]. - The framework supports a wide range of applications, enabling quick deployment of large models and improving inference capabilities [38][39].