Workflow
大语言模型上下文长度限制突破
icon
Search documents
DeepSeek开源新模型,用视觉方式压缩一切
Guan Cha Zhe Wang· 2025-10-20 10:47
Core Insights - DeepSeek has released a new OCR model named DeepSeek-OCR, which features 3 billion parameters and aims to enhance text recognition efficiency through optical two-dimensional mapping [1][3]. Model Architecture - The DeepSeek-OCR model consists of two main components: DeepEncoder and DeepSeek3B-MoE-A570M decoder, designed for high-resolution input and efficient compression [3][7]. - DeepEncoder combines local perception capabilities with global understanding, achieving a 16x downsampling mechanism that retains 97% of key information [7]. Performance Metrics - The model achieves a decoding accuracy of 97% when the text token count is within 10 times the visual token count, and maintains approximately 60% accuracy at a compression rate of 20x [3]. - In benchmark tests, DeepSeek-OCR outperformed GOT-OCR2.0 and MinerU2.0 using significantly fewer visual tokens [4]. Practical Applications - DeepSeek-OCR can generate over 200,000 pages of LLM/VLM training data daily on a single A100-40G GPU, indicating its high operational efficiency [4][7]. - The model has potential applications in various sectors, including finance for digitizing financial reports, healthcare for archiving medical records, and publishing for digitizing ancient texts [17].