Core Viewpoint - Tencent has launched the open-source OCR model HunyuanOCR, which boasts a parameter count of 1 billion and achieves optimal performance in various OCR application evaluations [1] Group 1: Model Features - HunyuanOCR is built on a native multimodal architecture and employs an end-to-end training inference paradigm, allowing multiple tasks to be completed in a single forward inference, offering efficiency advantages over traditional cascading solutions [1] - The architecture consists of three components: a native resolution video encoder, an adaptive visual adapter, and a lightweight language model [1] Group 2: Performance Metrics - In the complex document parsing evaluation OmniDocBench, HunyuanOCR scored 94.1, surpassing models like Google's Gemini3-pro [1] - The model demonstrates superior text detection and recognition capabilities across a test set covering nine scenarios, including documents, street scenes, and handwriting, outperforming both open-source and commercial models [1] Group 3: Application and Recognition - HunyuanOCR supports translation for 14 minor languages and won the small model track championship at the ICDAR2025 document translation competition [1] - The model has been applied in various scenarios, including invoice field extraction, video subtitle recognition, and photo translation, and its source code has been officially released [1]
腾讯混元OCR模型宣布开源:参数量1B 支持14种小语种翻译