DeepSeek-OCR 2发布:让AI像人一样“读懂”复杂文档

Core Insights - DeepSeek team released the paper "DeepSeek-OCR 2: Visual Causal Flow" and open-sourced the DeepSeek-OCR 2 model, which features an innovative DeepEncoder V2 structure that dynamically adjusts the processing order of visual information based on image semantics [1][2] - The new model aims to align machine processing more closely with human visual reading logic, addressing limitations in traditional visual language models that process images in a fixed grid order [1] Model Performance - DeepSeek-OCR 2 achieved an overall score of 91.09% on the OmniDocBench v1.5 benchmark, representing a 3.73% improvement over its predecessor [2] - The model demonstrated enhanced accuracy in reading order, with the edit distance decreasing from 0.085 to 0.057, indicating a better understanding of document content structure [2]