Core Viewpoint - The collaboration between Zhipu and Huawei has led to the development of GLM-Image, a new generation open-source image generation model, which is the first SOTA multimodal model trained entirely on domestic chips [2][9]. Group 1: Model Development and Features - GLM-Image is based on the Ascend Atlas 800T A2 device and the MindSpore AI framework, completing the entire training process from data to model [2][8]. - The model employs an innovative architecture that combines a 9B autoregressive model with a 7B DiT diffusion decoder, enhancing its ability to understand complex instructions and accurately render text [5][8]. - GLM-Image supports image generation tasks at various resolutions, natively accommodating sizes from 1024x1024 to 2048x2048 without the need for retraining [5][8]. Group 2: Performance and Comparisons - In authoritative rankings for text rendering, GLM-Image has achieved open-source SOTA levels, outperforming several other models such as Seedream and Nano Banana Pro [6]. - The model excels in generating illustrations that involve complex logical processes and textual explanations, maintaining consistency in style and accuracy in text generation across various formats [7]. Group 3: Economic Aspects - The cost for generating an image using GLM-Image via API is only 0.1 yuan, showcasing its affordability and accessibility [8]. - The successful training of GLM-Image on domestic chips validates the feasibility of developing high-performance multimodal generation models on a fully domestic computing stack [9].
联合华为开源新模型 智谱涨超16%