Workflow
Visual Compression of Long - Text Context
icon
Search documents
DeepSeek团队发布新型视觉压缩模型DeepSeek-OCR
智通财经网· 2025-10-20 11:37
Core Insights - DeepSeek-AI team has launched a new research achievement called DeepSeek-OCR, which innovatively compresses long text context into visual tokens, significantly reducing the number of tokens needed for processing [1] - The system consists of two main components: DeepEncoder, designed for high-resolution input with low computational activation, and DeepSeek3B-MoE-A570M as the decoder [1] - Experimental results show that when the number of text tokens does not exceed ten times the number of visual tokens (compression ratio below 10x), the model achieves an OCR accuracy of 97%, and even at a 20x compression ratio, the accuracy remains around 60% [1] Performance Metrics - In the OmniDocBench test, DeepSeek-OCR surpassed the performance of GOT-OCR2.0, using only 100 visual tokens compared to 256 tokens per page for GOT-OCR2.0 [2] - DeepSeek-OCR also outperformed MinerU2.0, which uses an average of over 6000 tokens per page, by utilizing less than 800 visual tokens [2] - The system can generate over 200,000 pages of training data for large language models/visual language models daily on a single A100-40G GPU [2]