多模态领域任务 - filings, earnings calls, financial reports, news

多模态领域任务

Search documents

未知机构：国盛计算机DeepSeekOCR2模拟人类阅读习惯重排阅读顺序实现O-20260128

未知机构· 2026-01-28 02:00

Summary of Key Points from the Conference Call Company and Industry - The document discusses **Guosheng Computer** and its innovative product **DeepSeek OCR2**, which addresses challenges in the Optical Character Recognition (OCR) industry. Core Insights and Arguments - **Traditional OCR Challenges**: Existing OCR systems struggle with documents that contain mixed text and images, leading to errors in reading order and chaotic results [1] - **DeepSeek-OCR2 Innovation**: The product shifts the processing of "reading order/logic" from the decoder (LLM) to the encoder (Encoder), enhancing the OCR performance [2] - **VisualCausalFlow Concept**: DeepSeek introduces the concept of VisualCausalFlow, which rearranges documents semantically into a logical reading order before processing, allowing the encoder to convert 2D document content into a 1D causal flow [2] - **Performance Improvement**: The OmniDocBench v1.5 test scores for DeepSeek-OCR2 reached **91.09**, indicating a significant improvement over the previous generation of DeepSeek OCR [2] Additional Important Content - **Two-Level Understanding**: DeepSeek-OCR2's approach of breaking down 2D understanding into two levels of "one-dimensional causal reasoning" is noteworthy. The encoder constructs the reading flow while the decoder generates and infers, which may have implications beyond OCR, potentially benefiting various multimodal tasks in the future [2]

VisualCausalFlow（视觉因果流）

多模态领域任务

Artificial Intelligence

DeepSeek OCR2

VisualCausalFlow（视觉因果流）

多模态领域任务

Artificial Intelligence

DeepSeek OCR2