DeepSeek开源新模型：单张A100日处理可超20万页数据

Core Viewpoint - DeepSeek has released a new OCR model that utilizes visual modalities for efficient text compression, achieving significant reductions in token usage while maintaining high accuracy in text recognition [2][5][6]. Summary by Sections Model Overview - The new OCR model, named DeepSeek-OCR, was open-sourced on October 20 and is detailed in the paper "DeepSeek-OCR: Contexts Optical Compression" [2]. - The model addresses the computational challenges faced by large language models when processing lengthy text by compressing text into visual formats, achieving nearly 10 times lossless context compression while maintaining an OCR accuracy of over 97% [5][6]. Technical Specifications - The model can generate training data for large language models/visual language models at a rate of over 200,000 pages per day using a single A100-40G GPU [7]. - DeepSeek-OCR consists of two main components: DeepEncoder for image feature extraction and compression, and DeepSeek3B-MoE for reconstructing text from compressed visual tokens [7]. - The decoder employs a MoE (Mixture of Experts) design, activating 6 out of 64 experts, resulting in approximately 570 million active parameters, combining the expressive power of a 3 billion parameter model with the inference efficiency of a 500 million parameter model [7]. Experimental Results - Experimental data indicates that when the number of text tokens is within 10 times that of visual tokens (compression ratio less than 10), the model achieves an OCR accuracy of 97%. Even at a compression ratio of 20, the accuracy remains around 60% [7]. Future Directions - The team proposes a novel approach to simulate human memory decay through optical compression, gradually reducing the size of rendered images for older contexts to decrease token consumption, potentially leading to breakthroughs in handling ultra-long contexts [8]. Community Response - The release has garnered positive feedback, with over 1,400 stars on GitHub shortly after its launch, indicating strong interest in the model [9]. - The project was led by researchers with prior experience in developing advanced OCR systems, suggesting a solid foundation for the new model [9]. Market Position - There are concerns in the market regarding DeepSeek's pace of innovation, with some voices suggesting that the company may be focusing on internal development to prepare for future models [10].