光学压缩
Search documents
AI 又进化了,DeepSeek 再推 “ 王炸 ” 新功能
3 6 Ke· 2025-10-24 11:48
Core Insights - DeepSeek has introduced a new open-source model called DeepSeek-OCR, which utilizes a 30 billion parameter architecture to read text through images, effectively compressing text into visual tokens [1][2][19]. Group 1: Model Functionality - The model aims to replace traditional text tokens with visual tokens, achieving optical compression that allows for significant reductions in the amount of data processed [2][5]. - For instance, content that originally required 1000 tokens can now be represented with just 100 visual tokens, achieving a compression ratio of 10 times while maintaining 97% OCR accuracy [5][19]. - The model consists of two main components: DeepEncoder for image compression and DeepSeek3B-MoE for decoding the visual tokens back into text [11][12]. Group 2: Training and Data Utilization - DeepSeek trained the model on an extensive dataset of 30 million PDF documents across 100 languages, with a significant portion being in Chinese and English [12][14]. - The training also included 3 million Word documents for specialized tasks such as formula recognition and HTML table extraction, showcasing a comprehensive approach to data coverage [14][19]. Group 3: Performance and Efficiency - In tests, DeepSeek-OCR outperformed existing models like GOT-OCR2.0 and MinerU2.0, demonstrating superior performance with fewer visual tokens [16][19]. - The model's architecture allows it to operate efficiently, activating only a fraction of its parameters during processing, which enhances speed and reduces computational load [11][19]. Group 4: Philosophical Implications - The model introduces a concept of selective memory, simulating human-like forgetting by compressing older information over time, which could lead to more efficient long-term interactions [16][18]. - This approach challenges traditional notions of memory in AI, suggesting that effective information retention may not always require accumulation but rather a focus on relevance and clarity [18][22].
DeepSeek-OCR技术深度剖析:长文本处理的光学压缩路径与产业应用前瞻
Haitong Securities International· 2025-10-23 13:35
wo[Table_Title] Research Report 23 Oct 2025 电子 Technology DeepSeek-OCR 技术深度剖析:长文本处理的光学压缩路径与产业应用前瞻 DeepSeek-OCR: A Technical Deep Dive into the Optical Compression Path for Long-Context Processing and Its Industrial Applications 姚书桥 Barney Yao 吕小潼 Xiaotong Lyu 点评 从"加长窗口"到"先压后解"。当前,长文本处理技术正沿两条差异化路径演进:上一代方案以"扩展上下文窗 口"为核心,如 Gemini 1.5 支持 2M tokens、OpenAI GPT-4.1 提供 1M tokens,并通过 RAG 与稀疏注意力等技术优化 其二次方复杂度带来的计算开销,该路径虽提升了单次输入的上限,但未能改变推理成本随文本长度线性增长的本 质;DeepSeek-OCR 则代表新一代"压缩存储"思路,通过将文本映射为视觉表征并进行高倍率压缩,以少量视觉 token 承载长上 ...
DeepSeek的终极野心:把大语言模型的基本语言都改造成图像
3 6 Ke· 2025-10-21 12:52
Core Insights - DeepSeek has open-sourced DeepSeek-OCR, an OCR model that achieves state-of-the-art results on benchmarks like OmniDocBench [1] - The motivation behind entering the OCR field is to address the computational bottleneck of long context processing in large language models (LLMs) [4][6] - The paper proposes that text information can be efficiently compressed through optical 2D mapping, allowing visual language models (VLMs) to decompress original information from images [4][6] Group 1: Long Context Processing - The pursuit of longer context in LLMs has led to a competitive arms race, with token windows expanding from thousands to millions [7] - The core limitation arises from the attention mechanism in the Transformer architecture, where computational complexity and memory usage grow quadratically with sequence length [7] - DeepSeek-AI's engineers propose a fundamental question: can the number of tokens be compressed rather than just optimizing attention calculations? [7][10] Group 2: Visual Tokens vs. Text Tokens - Visual tokens are the basic units of information processed by visual models, while text tokens are used by LLMs [8] - A 1024x1024 image can be divided into 4096 visual tokens, significantly reducing the number of tokens needed compared to text representation [9] - The understanding that visual modalities can serve as efficient compression mediums for text information led to the creation of DeepSeek-OCR [9] Group 3: DeepEncoder and Compression Techniques - DeepSeek-OCR is essentially a proof of concept for an "optical compression-decompression" system [10] - The DeepEncoder, a key innovation, is designed to handle high-resolution inputs while producing minimal visual tokens [11][12] - The architecture consists of three stages: a local detail processor, a compression module, and a global attention layer [14][16] Group 4: Performance Metrics - Experimental results show a 10.5x compression rate with 64 visual tokens decoding 600-700 text tokens, achieving an OCR accuracy of 96.5% [17][18] - At a 20x compression rate, the model maintains around 60% accuracy while decoding over 1200 text tokens [17][18] - DeepSeek-OCR outperforms existing models like GOT-OCR2.0 and MinerU2.0 in terms of performance and token efficiency [19][20] Group 5: Future Vision and Memory Simulation - The team aims to simulate human memory's forgetting mechanism, which naturally prioritizes relevant information while compressing less important details [25][27] - The multi-resolution design of DeepSeek-OCR provides a technical foundation for managing memory in a way that mimics human cognitive processes [29][30] - The ultimate goal is to create a system that balances information retention and computational efficiency, potentially leading to a new paradigm in AI memory and input systems [32][35]
DeepSeek开源新模型:单张A100日处理可超20万页数据
第一财经· 2025-10-20 14:58
2025.10. 20 本文字数:1556,阅读时长大约3分钟 作者 | 第一财经 刘晓洁 DeepSeek又发新模型了,这次是一个OCR 模型。10月20日,DeepSeek在Github开源了这一新模型,并发布《DeepSeek-OCR:Contexts Optical Compression》(《DeepSeek OCR:上下文光学压缩》)论文,解释了这一成果。 解码器用的是 DeepSeek-3B-MoE 架构。虽然只有 3B 参数,但采用了 MoE(混合专家)设计,64 个专家中激活 6 个,再加 2 个共享专家,实际激活参 数约 5.7 亿。这也让模型既有 30 亿参数模型的表达能力,又保持了5亿参数模型的推理效率。 实验数据显示,当文本 token 数量在视觉 token 的 10 倍以内(即压缩率小于10倍)时,模型的解码(OCR)精度可达 97%;即使在压缩率达到 20倍的 情况下,OCR 准确率仍保持在约60%。 DeepSeek 团队在论文里还提出了具有想象力的未来——用光学压缩模拟人类的遗忘机制。人类的记忆会随时间衰退,越久远的事情记得越模糊,那是否 AI也能这样?于是,团队设计将更久 ...