Seek .-AI 又进化了，DeepSeek 再推 “ 王炸 ” 新功能

Core Insights - DeepSeek has introduced a new open-source model called DeepSeek-OCR, which utilizes a 30 billion parameter architecture to read text through images, effectively compressing text into visual tokens [1][2][19]. Group 1: Model Functionality - The model aims to replace traditional text tokens with visual tokens, achieving optical compression that allows for significant reductions in the amount of data processed [2][5]. - For instance, content that originally required 1000 tokens can now be represented with just 100 visual tokens, achieving a compression ratio of 10 times while maintaining 97% OCR accuracy [5][19]. - The model consists of two main components: DeepEncoder for image compression and DeepSeek3B-MoE for decoding the visual tokens back into text [11][12]. Group 2: Training and Data Utilization - DeepSeek trained the model on an extensive dataset of 30 million PDF documents across 100 languages, with a significant portion being in Chinese and English [12][14]. - The training also included 3 million Word documents for specialized tasks such as formula recognition and HTML table extraction, showcasing a comprehensive approach to data coverage [14][19]. Group 3: Performance and Efficiency - In tests, DeepSeek-OCR outperformed existing models like GOT-OCR2.0 and MinerU2.0, demonstrating superior performance with fewer visual tokens [16][19]. - The model's architecture allows it to operate efficiently, activating only a fraction of its parameters during processing, which enhances speed and reduces computational load [11][19]. Group 4: Philosophical Implications - The model introduces a concept of selective memory, simulating human-like forgetting by compressing older information over time, which could lead to more efficient long-term interactions [16][18]. - This approach challenges traditional notions of memory in AI, suggesting that effective information retention may not always require accumulation but rather a focus on relevance and clarity [18][22].