全球多模态推理新标杆智谱视觉推理模型GLM-4.5V正式上线并开源

Group 1 - Beijing Zhiyuan Huazhang Technology Co., Ltd. (Zhiyuan) launched the GLM-4.5V, a 100B-level open-source visual reasoning model with a total of 106 billion parameters and 12 billion active parameters [1][2] - GLM-4.5V is a significant step towards Artificial General Intelligence (AGI) and achieves state-of-the-art (SOTA) performance across 41 public visual multimodal benchmarks, covering tasks such as image, video, document understanding, and GUI agent functionalities [2][5] - The model features a "thinking mode" switch, allowing users to choose between quick responses and deep reasoning, balancing efficiency and effectiveness [5][6] Group 2 - GLM-4.5V is composed of a visual encoder, MLP adapter, and language decoder, supporting 64K multimodal long contexts and enhancing video processing efficiency through 3D convolution [6] - The model employs a three-stage strategy: pre-training, supervised fine-tuning (SFT), and reinforcement learning (RL), which collectively enhance its capabilities in complex multimodal understanding and reasoning [6][7] - The pricing for API calls is set at 2 yuan per million tokens for input and 6 yuan per million tokens for output, providing a cost-effective solution for enterprises and developers [5]