Workflow
昇腾Atlas 800T A2
icon
Search documents
国产AI登顶全球!智谱+华为联手
Ke Ji Ri Bao· 2026-01-17 00:19
Core Insights - GLM-Image, a multimodal image generation model jointly developed by Zhipu and Huawei, has topped the Trending chart on Hugging Face, breaking the long-standing dominance of foreign models in the open-source space [2] - The model is the first state-of-the-art (SOTA) multimodal model trained entirely on domestic chips, showcasing a significant breakthrough in the domestic AI industry chain [2][5] Group 1: Model Architecture and Performance - GLM-Image employs a self-innovated "autoregressive + diffusion decoder" hybrid architecture, enabling the integration of image generation and language models, marking an important exploration in the new generation of "cognitive generation" technology [3] - The model excels in generating text-heavy content, achieving the top rank in the CVTG-2K and LongText-Bench benchmarks, demonstrating superior accuracy in generating multiple text areas within images and rendering long texts [3][6] Group 2: Cost and Efficiency - The model offers high cost-effectiveness, with the API call cost for generating an image being only 0.1 yuan, and a speed-optimized version is set to be released soon [4] Group 3: Domestic Chip Utilization - GLM-Image represents a deep exploration and validation of the domestic computing ecosystem, with all processes from data preprocessing to large-scale pre-training conducted on Huawei's Ascend Atlas 800T A2 devices [5] - This model's development on domestic hardware and frameworks addresses the critical issue of dependency on foreign chips, validating the feasibility of training cutting-edge models on a fully domestic computing stack [5][6] Group 4: Industry Implications - The success of GLM-Image is seen as a result of the collaborative capabilities of the domestic AI industry chain, which can enable small and medium enterprises in China to access AI tools at lower costs and promote domestic AI technology on a global scale [6]
首次!国芯训国模取得世界第一
智通财经网· 2026-01-16 00:33
Core Viewpoint - The collaboration between Zhiyu (02513) and Huawei has led to the development of the GLM-Image model, which is the first state-of-the-art (SOTA) multimodal model trained entirely on domestic chips, marking a significant breakthrough in China's AI model development on the international stage [1][3]. Group 1: Model Development and Performance - GLM-Image was trained using Huawei's Ascend Atlas 800T A2 devices and the MindSpore AI framework, achieving full-process training and inference adaptation [5]. - The model reached the top position on the Hugging Face global AI open-source community leaderboard within 24 hours of its release, indicating its SOTA performance and innovative structure [1][3]. - GLM-Image employs a novel "autoregressive + diffusion decoder" hybrid architecture, which excels in generating knowledge-intensive scenarios such as posters and educational graphics, particularly in generating Chinese characters [4]. Group 2: Technological Significance - This model represents the first fully domestically trained AI model, showcasing China's independent research and development capabilities in AI on an international level [3]. - The collaboration highlights a complete domestic AI technology stack, with Zhiyu's leading model architecture, Huawei's high-performance AI chips, and the self-developed AI computing framework MindSpore, demonstrating a comprehensive breakthrough in core model, hardware, and computing framework [5].
港股异动丨智谱高开超7%,联合华为开源首个国产芯片训练的多模态SOTA模型
Ge Long Hui· 2026-01-14 17:31
Core Viewpoint - Zhizhu (2513.HK) opened 7.1% higher at HKD 194.7, following the announcement of a collaboration with Huawei to launch the new generation image generation model GLM-Image, which is the first SOTA multimodal model fully trained on domestic chips [1] Group 1: Product Development - GLM-Image is based on the Ascend Atlas 800T A2 device and the MindSpore AI framework, completing the entire process from data to training [1] - The model employs an innovative "autoregressive + diffusion decoder" hybrid architecture, achieving a combination of image generation and language modeling [1] Group 2: Technological Significance - This development represents an important exploration for Zhizhu towards the new generation of "cognitive generation" technology paradigm, exemplified by the Nano Banana Pro [1]
英伟达H200“解禁”次日,智谱联手华为发布全国产开源多模态模型!
Guan Cha Zhe Wang· 2026-01-14 09:34
Core Viewpoint - The launch of the GLM-Image model by Zhiyuan in collaboration with Huawei marks a significant advancement in the domestic AI landscape, demonstrating that high-end computing power no longer needs to rely on imports for top-tier model training [1][16]. Group 1: Model Development and Performance - GLM-Image is the first state-of-the-art (SOTA) multimodal model trained entirely on domestic chips, showcasing the feasibility of training cutting-edge models on a fully domestic computing stack [1][12]. - The model employs a hybrid architecture of "autoregressive + diffusion decoder," achieving a combination of image generation and language modeling [1][13]. - In performance benchmarks, GLM-Image outperforms competitors like Qwen-Image and Z-Image, achieving top scores in various metrics, including a Word Accuracy of 0.9116 and a NED of 0.9557 [6][7][8]. Group 2: Economic Impact and Market Response - Following the announcement, Zhiyuan's stock surged by 18%, nearly doubling from its initial public offering price of 116.2 HKD, with a market capitalization exceeding 100 billion HKD [5]. - The model's ability to generate commercial-grade images at a cost of only 0.1 yuan per image demonstrates the economic viability of domestic computing power against international standards [15]. Group 3: Technological Innovation and Training Process - The training process for GLM-Image is optimized through a custom-built training suite that leverages Huawei's Ascend Atlas 800T A2 devices and the MindSpore AI framework, ensuring end-to-end optimization from data preprocessing to large-scale pre-training [10][12]. - The model's architecture allows for flexible image size generation without post-processing, accommodating various formats such as social media covers and movie posters [13]. Group 4: Industry Context and Future Implications - The timing of the GLM-Image launch coincides with the U.S. lifting export restrictions on NVIDIA's H200, indicating a shift in the competitive landscape where domestic solutions are now viable alternatives [16]. - This development signifies a potential turning point in China's AI industry, moving from imitation to innovation, as domestic models begin to dominate in complex Chinese language and visual generation tasks [17].
港股AI应用板块回暖 智谱高开逾7% 联合华为开源首个国产芯片训练的多模态SOTA模型
Xin Lang Cai Jing· 2026-01-14 01:31
Core Viewpoint - The Hong Kong stock market's AI application sector is experiencing a rebound, with notable increases in stock prices for several companies, including Zhixing Technology and Zhipu, which opened over 7% higher [1][5]. Group 1: Stock Performance - Zhixing Technology (01274) saw a price increase of 7.60%, reaching 7.080 [2][6]. - Zhipu (02513) rose by 7.10%, with a current price of 194.700 [2][6]. - MINIMAX (00100) increased by 2.74%, priced at 375.000 [2][6]. - Alibaba (09988) experienced a 2.44% rise, reaching 163.800 [2][6]. - Other companies such as Kuaishou (01024) and Weimeng Group (02013) also saw increases close to 2% [1][5]. Group 2: Technological Developments - Zhipu has collaborated with Huawei to launch a new generation image generation model called GLM-Image, which is the first state-of-the-art multimodal model trained entirely on domestic chips [2][6]. - The model utilizes the Ascend Atlas 800T A2 device and the MindSpore AI framework, completing the entire process from data to training [2][6].
华为的准万亿大模型,是如何训练的?
虎嗅APP· 2025-05-30 10:18
Core Viewpoint - The article discusses Huawei's advancements in AI training systems, particularly focusing on the MoE (Mixture of Experts) architecture and its optimization through the MoGE (Mixture of Generalized Experts) framework, which enhances efficiency and reduces costs in AI model training [1][2]. Summary by Sections Introduction to MoE and Huawei's Innovations - The MoE model, initially proposed by Canadian scholars, has evolved significantly, with Huawei now optimizing this architecture to address inefficiencies and cost issues [1]. - Huawei's MoGE architecture aims to create a more balanced and efficient training environment for AI models, contributing to the ongoing AI competition [1]. Performance Metrics and Achievements - Huawei's training system, utilizing the "昇腾+Pangu Ultra MoE" combination, has achieved significant performance metrics, including a 41% MFU (Model Floating Utilization) during pre-training and a throughput of 35K Tokens/s during post-training on the CloudMatrix 384 super node [2][26][27]. Challenges in MoE Training - Six main challenges in MoE training processes are identified: difficulty in parallel strategy configuration, All-to-All communication bottlenecks, uneven system load distribution, excessive operator scheduling overhead, complex training process management, and limitations in large-scale expansion [3][4]. Solutions and Innovations - **First Strategy: Enhancing Training Cluster Utilization** - Huawei implemented intelligent parallel strategy selection and global dynamic load balancing to improve overall training efficiency [6][11]. - A modeling simulation framework was developed to automate the selection of optimal parallel configurations for the Pangu Ultra MoE model [7]. - **Second Strategy: Releasing Computing Power of Single Nodes** - The focus shifted to optimizing operator computation efficiency, achieving a twofold increase in micro-batch size (MBS) and reducing host-bound issues to below 2% [15][16][17]. - **Third Strategy: High-Performance Scalable RL Post-Training Technologies** - The introduction of RL Fusion technology allows for flexible deployment modes and significantly improves resource utilization during post-training [19][21]. - The system's design enables a 50% increase in overall training throughput while maintaining model accuracy [21]. Technical Specifications of Pangu Ultra MoE - The Pangu Ultra MoE model features 718 billion parameters, with a structure that includes 61 layers of Transformer architecture, achieving high performance and scalability [26]. - The training utilized a large-scale cluster of 6K - 10K cards, demonstrating strong generalization capabilities and efficient scaling potential [26][27].
每2秒吃透一道高数大题!华为终于揭秘准万亿MoE昇腾训练系统全流程
华尔街见闻· 2025-05-30 09:38
Core Viewpoint - Huawei has achieved significant advancements in training large models through its "Ascend + Pangu Ultra MoE" system, demonstrating a fully domestic and GPU-free training process that enhances computational efficiency and model performance [3][4][38]. Group 1: Technical Innovations - Huawei's training system has achieved a model training efficiency with a utilization rate (MFU) of 41% during the pre-training phase using the Ascend Atlas 800T A2 cluster [4][38]. - The Pangu Ultra MoE model consists of 718 billion parameters, featuring a unique architecture with 61 layers, including 58 MoE layers, and is designed for high performance and scalability [38][39]. - The system supports a high throughput of 35K Tokens/s during the reinforcement learning (RL) post-training phase, showcasing its capability to process complex tasks rapidly [39]. Group 2: Challenges Addressed - The report identifies six key challenges in the current MoE pre-training and RL post-training processes, including difficulties in parallel strategy configuration, communication bottlenecks, and uneven system load distribution [7][10][12][13]. - Huawei has developed a comprehensive end-to-end solution to address these challenges, focusing on optimizing training cluster utilization and enhancing communication efficiency [14][16][25]. Group 3: Specific Solutions - The first strategy involves improving training cluster utilization through intelligent parallel strategy selection and global dynamic load balancing, significantly enhancing overall training efficiency [16][23]. - The second strategy focuses on releasing computational power at the single-node level by optimizing training operators and enhancing memory management, achieving a twofold increase in micro-batch size [26][30]. - The third strategy introduces high-performance scalable RL post-training technologies, allowing for flexible deployment modes and doubling the utilization rate of RL post-training clusters [33][34].
华为AI实力!不用GPU,大模型每2秒吃透一道高数大题!
第一财经· 2025-05-30 09:32
Core Viewpoint - Huawei has achieved significant advancements in training large models through its "Ascend + Pangu Ultra MoE" combination, enabling a fully controllable training process without the need for GPUs, showcasing industry-leading performance in cluster training systems [2][3]. Group 1: Technical Innovations - Huawei's training system has improved the model training efficiency significantly, with a pre-training model utilization rate (MFU) reaching 41% and a post-training throughput of 35K Tokens/s on the CloudMatrix 384 super node [3][34]. - The company has introduced a series of innovative solutions to address challenges in the MoE pre-training and reinforcement learning (RL) post-training processes, including intelligent parallel strategy selection and global dynamic load balancing [11][17]. - The training system utilizes a hierarchical All-to-All communication architecture to reduce communication overhead to nearly zero, enhancing the efficiency of expert parallel communication [14][15]. Group 2: Training Process Optimization - The training cluster's utilization has been optimized through a simulation-driven intelligent parallel optimization framework, which automates the selection of optimal deployment configurations [12][13]. - The team has implemented a memory optimization framework that achieves over 70% savings in activation memory, ensuring reliable long-term training even under increased memory pressure [25]. - The RL Fusion technology allows for flexible deployment modes, significantly improving resource scheduling during the inference phase and doubling the utilization rate in RL post-training [27][28]. Group 3: Model Specifications - The Pangu Ultra MoE model features 718 billion parameters, with a structure that includes 61 layers of Transformer architecture, designed for high sparsity and performance [32]. - The model's training utilized a cluster of 6K - 10K Ascend 800T A2 cards, achieving a high model utilization rate during the pre-training phase [32]. - The architecture supports efficient scaling to larger parameter models and clusters, with expectations of achieving an MFU greater than 50% in future iterations [32].