AI产业跟踪：DeepSeek开源DeepSeek-OCR，持续关注AI大模型技术路径演进与商业化进展

Investment Rating - The report maintains a "Positive" investment rating for the industry [8] Core Insights - On October 20, DeepSeek released the DeepSeek-OCR model with 3 billion parameters, designed for efficient visual-text compression, marking a preliminary exploration of "Contextual Optical Compression" [2][5] - The model utilizes visual modalities as an effective medium for compressing textual information, achieving state-of-the-art performance with minimal visual tokens during end-to-end model testing [7] - The report emphasizes the potential of the DeepSeek-OCR model to redefine context processing in large models and highlights the ongoing support for the domestic AI industry chain, recommending shovel stocks and major players with significant positioning advantages [7] Summary by Sections Event Description - DeepSeek's release of the DeepSeek-OCR model aims to achieve efficient visual-text compression through a novel approach of converting text to images, now available on Hugging Face [5] Event Commentary - The architecture introduces a new visual encoding structure called DeepEncoder, which efficiently extracts visual features at high resolutions while significantly reducing the number of visual tokens [7] - The model's core consists of DeepEncoder, SAM-base, and CLIP-large, which compresses input from approximately 4096 visual tokens to about 256 tokens, supporting multiple resolution modes [7] - The lightweight MoE decoder uses only about 570 million parameters during inference, enhancing efficiency compared to the full 3 billion parameter model [7] - The report notes that cost remains a core constraint on token consumption, with the OCR model expected to break computational limits and redefine context processing [7] Model Value and Performance - The DeepSeek-OCR model demonstrates that visual tokens can express information more efficiently, providing a new cost-reduction approach for long text context compression [10] - In OmniDocBench benchmark tests, the model outperformed existing models with significantly fewer visual tokens, showcasing its application potential in real-world environments [10] - The model's capabilities extend to deep parsing of various data types and support for nearly 100 languages, indicating strong generalization ability and broad application scenarios [10]