Workflow
刚刚,智谱开源了他们的最强多模态模型,GLM-4.5V。
数字生命卡兹克·2025-08-11 14:20

Core Viewpoint - The article highlights the release of GLM-4.5 and its successor GLM-4.5V, emphasizing their advanced capabilities in multimodal processing and superior performance in benchmark tests [1][2][6]. Model Release and Specifications - GLM-4.5V is a multimodal model with 106 billion total parameters and 12 billion active parameters, making it one of the largest open-source multimodal models available [3]. - The model has achieved state-of-the-art (SOTA) results in 41 out of 42 evaluation benchmarks, showcasing its strong performance [4][6]. Benchmark Performance - A detailed comparison of GLM-4.5V against other models shows its leading performance across various tasks, including visual question answering and reasoning [5]. - For instance, in the MMBench v1.1 benchmark, GLM-4.5V scored 88.2, outperforming other models like Qwen2.5-VL and GLM-4.1V [5]. Open Source and Accessibility - GLM-4.5V is available for download on multiple platforms, including GitHub and Hugging Face, although its large size may pose deployment challenges for consumer-level applications [7][8]. - The model can be accessed through the z.ai platform for those who prefer not to handle the deployment themselves [8][9]. Testing and Capabilities - Initial tests conducted on GLM-4.5V demonstrated its ability to accurately solve complex visual reasoning tasks, indicating its advanced cognitive capabilities [10][14][23]. - The model also exhibits impressive video understanding capabilities, able to analyze and summarize video content effectively, which is a significant advancement in multimodal AI [48][54][66]. Pricing and Economic Viability - The API pricing for GLM-4.5V is competitive, with input costs at 2 yuan per million tokens and output costs at 6 yuan per million tokens, making it an attractive option in the multimodal model market [83]. Conclusion - The continuous development and open-source approach of companies like Zhipu AI signify a shift in the AI landscape, promoting accessibility and innovation in the field [86][90][94].