刚刚,DeepSeek 发布 OCR 2
程序员的那些事·2026-01-27 15:40

Core Viewpoint - DeepSeek has launched a new model, DeepSeek-OCR 2, which utilizes the innovative DeepEncoder V2 method to dynamically rearrange image components based on their meaning, aligning more closely with human visual encoding logic [1][3]. Group 1: Differences in DeepSeek-OCR 2 - Unlike traditional OCR systems that uniformly scan and encode images, DeepSeek-OCR 2 introduces a semantic-driven dynamic encoding mechanism that assesses which visual areas are most important during the encoding phase [3]. - The previous version, DeepSeek-OCR 1, was recognized for treating OCR as a visual compression problem, focusing on compressing visual content into a format more understandable for language models [3]. - DeepSeek-OCR 2 advances this concept by allowing visual encoding to enter the "understanding phase" rather than merely serving as a preprocessing step [4]. Group 2: Open Source and Accessibility - As with previous significant releases, DeepSeek-OCR 2 has been made open source, with the project, paper, and model weights all available simultaneously [4]. - The project can be accessed at the provided GitHub link, along with the technical report and model on Hugging Face [4].

刚刚,DeepSeek 发布 OCR 2 - Reportify