Workflow
二维数据存储
icon
Search documents
DeepSeek昨天开源的新模型,有点邪门
3 6 Ke· 2025-10-22 01:00
Core Insights - DeepSeek has introduced a new model called DeepSeek-OCR, which can compress text information into images, achieving a significant reduction in token usage while maintaining high accuracy [5][31][39]. Group 1: Model Capabilities - DeepSeek-OCR can store large amounts of text as images, allowing for a more efficient representation of information compared to traditional text-based models [9][10]. - The model demonstrates a compression ratio where it can use only 100 visual tokens to outperform previous models that required 256 tokens, and it can achieve results with less than 800 visual tokens compared to over 6000 tokens used by other models [14][31]. - DeepSeek-OCR supports various resolutions and compression modes, adapting to different document complexities, with modes ranging from Tiny to Gundam, allowing for dynamic adjustments based on content [17][18]. Group 2: Data Utilization - The model can capture previously unutilized data from documents, such as graphs and images, which traditional models could not interpret effectively [24][26]. - DeepSeek-OCR can generate over 200,000 pages of training data in a day on an A100 GPU, indicating its potential to enhance the training datasets for future models [29]. - By utilizing image memory, the model reduces the computational load significantly, allowing for a more efficient processing of longer conversations without a proportional increase in resource consumption [31]. Group 3: Open Source Collaboration - The development of DeepSeek-OCR is a collaborative effort, integrating various open-source resources, including Huawei's Wukong dataset and Meta's SAM for image feature extraction [38][39]. - The model's architecture reflects a collective achievement from the open-source community, showcasing the potential of collaborative innovation in AI development [39].