智谱宣布开源视觉推理模型GLM-4.5V正式上线并开源

Core Insights - The article discusses the launch of GLM-4.5V, an open-source visual reasoning model by Zhiyuan AI, which boasts a total of 106 billion parameters and 12 billion active parameters [1] - The model is positioned as the best-performing open-source model in its class, achieving state-of-the-art (SOTA) performance across 41 public multimodal benchmarks [1] - The pricing for API calls is set at 2 yuan per million tokens for input and 6 yuan per million tokens for output, making it competitively priced [1] Company Overview - Zhiyuan AI has introduced GLM-4.5V, which is based on its flagship text model GLM-4.5-Air, continuing the technological trajectory established by GLM-4.1V-Thinking [1] - The model is designed to handle various tasks including image, video, document understanding, and GUI agent functionalities [1] Industry Context - Multimodal reasoning is identified as a crucial capability for achieving artificial general intelligence (AGI), allowing AI to perceive, understand, and make decisions like humans [1] - Vision-Language Models (VLM) are highlighted as the core foundation for enabling multimodal reasoning [1]