智谱开源GLM-OCR模型:仅0.9B参数,多项基准取得SOTA表现

Core Viewpoint - The company Zhipu has officially released and open-sourced GLM-OCR, a model with a parameter size of only 0.9 billion, achieving state-of-the-art (SOTA) performance in various mainstream benchmarks for formula recognition, table recognition, and information extraction [1] Group 1: Model Features - GLM-OCR is optimized for scenarios including handwriting, complex tables, code documents, seal recognition, and multilingual mixed typesetting [1] - The model employs an "encoder-decoder" architecture, integrating a self-developed CogViT visual encoder, and utilizes a two-stage technical process of "layout analysis → parallel recognition" [1] Group 2: Performance and Efficiency - The model can process PDF documents at a throughput of 1.86 pages per second [1] - The pricing for API calls is set at 0.2 yuan per million tokens [1] Group 3: Deployment and Tools - GLM-OCR supports deployment on vLLM, SGLang, and Ollama [1] - The complete SDK and inference toolchain for the model have been open-sourced, making it suitable for high concurrency and edge computing scenarios [1]

KNOWLEDGE ATLAS-智谱开源GLM-OCR模型:仅0.9B参数,多项基准取得SOTA表现 - Reportify