智谱联合华为开源图像生成模型GLM-Image:首个在国产芯片完成全程训练的SOTA模型
IPO早知道·2026-01-14 01:57

Core Viewpoint - The article discusses the successful development and open-sourcing of the GLM-Image model by Zhiyu and Huawei, marking a significant advancement in training state-of-the-art (SOTA) models on domestic chips, specifically the Ascend Atlas 800T A2 device and MindSpore AI framework [4][10]. Group 1: Model Development and Architecture - GLM-Image is the first SOTA model trained entirely on domestic chips, utilizing a self-regressive structure and a hybrid architecture combining "self-regression + diffusion encoder" to enhance global instruction understanding and local detail depiction [4][5]. - The model addresses challenges in generating knowledge-intensive scenes such as posters and PPTs, representing a step towards a new generation of "knowledge + reasoning" cognitive generative models [5][10]. - The innovative architecture allows GLM-Image to understand complex instructions and accurately render text, integrating a 9B self-regressive model with a 7B DiT diffusion decoder [7]. Group 2: Performance Metrics - GLM-Image achieved top performance in the CVTG-2K benchmark with a Word Accuracy of 0.9116 and a Normalized Edit Distance (NED) of 0.9557, indicating high accuracy in generating text within images [6][8]. - In the LongText-Bench benchmark, GLM-Image scored 0.952 in English and 0.979 in Chinese, leading among open-source models for rendering long and multi-line texts [8]. Group 3: Technical Innovations - The model supports adaptive processing of various resolutions, generating images from 1024x1024 to 2048x2048 without the need for retraining [7]. - GLM-Image's training process was optimized through advanced techniques such as dynamic graph multi-level pipeline dispatch and high-performance fusion operators, enhancing both training stability and performance [10]. Group 4: Market Implications - The introduction of GLM-Image signifies a deep exploration of domestic computing ecosystems, showcasing the potential of domestic computing power in training high-performance multimodal generative models [10]. - The model's open-source nature aims to share technological pathways and practical insights with the open-source community, potentially driving further advancements in the field [5][10].