Core Viewpoint - The article highlights the launch of GLM-Image, a new open-source image generation model developed by Zhiyuan in collaboration with Huawei, which is the first SOTA multimodal model fully trained on domestic chips [1][9]. Group 1: Model Features and Innovations - GLM-Image is based on the Ascend Atlas 800T A2 device and the MindSpore AI framework, completing the entire training process from data to model [1][9]. - The model employs a hybrid architecture combining a 9 billion parameter autoregressive model and a 7 billion parameter DiT diffusion decoder, enhancing its ability to understand complex instructions and accurately render text [2][12]. - GLM-Image can adaptively handle various resolutions, natively supporting image generation tasks from 1024x1024 to 2048x2048 without the need for retraining [3][12]. Group 2: Performance and Applications - The model has achieved open-source SOTA levels in text rendering, demonstrating its capabilities in generating complex illustrations and diagrams with logical processes and textual explanations [6][15]. - In generating e-commerce images and multi-panel comics, GLM-Image maintains consistency in style and subject while ensuring high accuracy in text generation [8][17]. - The cost for generating an image using GLM-Image via API is only 0.1 yuan [17]. Group 3: Technological Significance - GLM-Image represents a significant exploration into the "cognitive generation" technology paradigm, marking a shift from traditional image generation to models that integrate world knowledge and reasoning capabilities [2][11]. - The model's development validates the feasibility of training high-performance multimodal generation models on domestic full-stack computing power [17].
联合华为开源新模型,智谱涨超16%