Core Viewpoint - DeepSeek has introduced a new model, DeepSeek-OCR, which utilizes images to store information instead of relying solely on text tokens, significantly improving data compression and model efficiency [5][11][26]. Group 1: Model Functionality - DeepSeek-OCR can convert large amounts of text into images, serving as a memory carrier for AI, which allows for more efficient data storage [9][14]. - The model demonstrates superior performance by using fewer visual tokens compared to traditional models, achieving better results with less resource consumption [11][26]. - In tests, DeepSeek-OCR used only 100 visual tokens to outperform GOT-OCR 2.0, which required 256 tokens, and it achieved results with less than 800 visual tokens compared to over 6000 tokens for MinerU 2.0 [11][14]. Group 2: Data Collection and Utilization - The model can capture previously uncollected data from two-dimensional information, such as graphs and images in academic papers, which traditional models could not interpret [22][24]. - DeepSeek-OCR can generate over 200,000 pages of training data in a day on an A100 GPU, indicating its potential to enhance the training datasets for future models [24]. - The model's ability to remember the position of images and surrounding text allows for a more comprehensive understanding of the data [18][22]. Group 3: Resource Efficiency - By using image-based memory, DeepSeek-OCR can reduce the number of tokens required to one-tenth of the original, while maintaining a high accuracy rate of 96.5% [26][27]. - The model's design allows for dynamic adjustments in token usage based on the complexity of the document, optimizing resource allocation [14][15]. - The research indicates that even with a 20-fold compression, the model can retain around 60% accuracy, showcasing its robustness [27]. Group 4: Open Source Collaboration - DeepSeek-OCR is an open-source project that integrates contributions from various global open-source communities, utilizing datasets and models from companies like Huawei, Baidu, Meta, and OpenAI [32][34]. - This collaborative effort has resulted in a model capable of "thinking in images," highlighting the importance of community-driven innovation in AI development [34].
DeepSeek开源的新模型,有点邪门
创业邦·2025-10-25 10:14