DeepSeek
Search documents
全新开源的DeepSeek-OCR,可能是最近最惊喜的模型。
数字生命卡兹克· 2025-10-21 01:32
Core Insights - The article discusses the introduction of DeepSeek-OCR, a new model that enhances traditional Optical Character Recognition (OCR) capabilities by not only extracting text but also generating structured documents and compressing information effectively [1][3][5]. Group 1: Traditional OCR vs. DeepSeek-OCR - Traditional OCR primarily converts images of text into editable digital text, which can be cumbersome for complex documents like financial reports [3][5]. - DeepSeek-OCR goes beyond traditional OCR by generating Markdown documents that maintain the structure of the original content, including text, titles, and charts, making it more versatile [5][6]. Group 2: Contextual Compression - DeepSeek-OCR introduces a novel approach called "Contextual Optical Compression," which allows the model to process long texts more efficiently by converting them into image files instead of tokenized text [18][19]. - This method significantly reduces the computational load associated with processing long texts, as the complexity of token processing increases quadratically with text length [8][10][11]. Group 3: Performance Metrics - The model achieves a remarkable compression ratio of up to 10 times while maintaining a recognition accuracy of 96.5% [23]. - The compression ratio is calculated by dividing the total number of original text tokens by the number of visual tokens after compression [24]. Group 4: Implications for AI and Memory - The article suggests that DeepSeek-OCR's approach mirrors human memory, where recent information is retained with high fidelity while older information gradually fades [39][40]. - This mechanism of "forgetting" is presented as a potential advantage for AI, allowing it to prioritize important information and manage memory more like humans do [40][41].
DeepSeek新模型被硅谷夸疯了!用二维视觉压缩一维文字,单GPU能跑,“谷歌核心机密被开源”
Hua Er Jie Jian Wen· 2025-10-21 00:27
Core Insights - DeepSeek has released an open-source model named DeepSeek-OCR, which is gaining significant attention in Silicon Valley for its innovative approach to processing long texts using visual compression techniques [1][4][21] - The model is designed to tackle the computational challenges associated with large models handling lengthy text, achieving high accuracy rates even with reduced token usage [1][4][5] Model Performance - DeepSeek-OCR operates with a model size of 3 billion parameters and demonstrates a remarkable ability to decode text with high accuracy, achieving 97% accuracy with a compression ratio of less than 10 times and maintaining 60% accuracy even at a 20 times compression ratio [1][4][5] - The model has been benchmarked against existing models, showing superior performance with significantly fewer visual tokens, such as using only 100 visual tokens to outperform models that require 256 tokens [7][8] Data Generation Efficiency - The model can generate over 200,000 pages of high-quality training data daily using a single A100-40G GPU, showcasing its efficiency in data generation [2][4] Innovative Approach - DeepSeek introduces a concept called "Contextual Optical Compression," which compresses textual information into visual formats, allowing the model to interpret content through images rather than text [4][10] - The architecture includes two main components: the DeepEncoder for converting images into compressed visual tokens and the DeepSeek3B-MoE-A570M for reconstructing text from these tokens [10][11] Flexibility and Adaptability - The DeepEncoder is designed to handle various input resolutions and token counts, allowing it to adapt to different compression needs and application scenarios [11][12] - The model supports complex image analyses, including financial reports and scientific diagrams, enhancing its applicability across diverse fields [12][14] Future Implications - The research suggests that this unified approach to visual and textual processing could be a step towards achieving Artificial General Intelligence (AGI) [4][21] - The team behind DeepSeek-OCR is exploring the potential of simulating human memory mechanisms through optical compression, which could lead to more efficient handling of long-term contexts in AI [20][21]
DeepSeek新模型被硅谷夸疯了!用二维视觉压缩一维文字,单GPU能跑,“谷歌核心机密被开源”
量子位· 2025-10-20 23:34
Core Insights - DeepSeek has released a groundbreaking open-source model named DeepSeek-OCR, which is gaining significant attention in Silicon Valley for its innovative approach to processing long texts with high efficiency [1][3][7]. Model Overview - The DeepSeek-OCR model addresses the computational challenges associated with large models handling long texts by utilizing a method that compresses textual information into visual tokens, thereby reducing the number of tokens needed for processing [5][12][13]. - The model achieves high accuracy rates, with a decoding accuracy of 97% when the compression ratio is less than 10 times and around 60% even at a 20 times compression ratio [6]. Performance Metrics - DeepSeek-OCR has demonstrated superior performance on the OmniDocBench benchmark, achieving state-of-the-art (SOTA) results with significantly fewer visual tokens compared to existing models [14][15]. - For instance, using only 100 visual tokens, DeepSeek-OCR outperforms the GOT-OCR2.0 model, which uses 256 tokens, and matches the performance of other models while using far fewer tokens [17]. Technical Components - The architecture of DeepSeek-OCR consists of two main components: the DeepEncoder, which converts high-resolution images into highly compressed visual tokens, and the DeepSeek3B-MoE-A570M decoder, which reconstructs text from these tokens [20][22]. - The model supports various input modes, allowing it to adapt its compression strength based on the specific task requirements [24]. Innovative Concepts - The research introduces the concept of "Contextual Optical Compression," which simulates human memory mechanisms by dynamically allocating computational resources based on the temporal context of the information being processed [36][38]. - This approach aims to enhance the model's ability to handle long conversations or documents, potentially leading to a more human-like memory structure in AI systems [39][41].
刚刚,DeepSeek重要突破,大模型上下文紧箍咒打破
3 6 Ke· 2025-10-20 23:22
Core Insights - DeepSeek has introduced a novel technology path in the competition of large language models by open-sourcing the DeepSeek-OCR model, which proposes the concept of "Contextual Optical Compression" for efficient information compression through text-to-image conversion [1][8]. Group 1: Model Performance and Capabilities - The feasibility of DeepSeek-OCR has been validated, achieving a decoding accuracy of 97% at a 10x compression ratio, indicating near-lossless compression, while maintaining approximately 60% accuracy at a 20x compression ratio [3][21]. - DeepSeek-OCR can express similar textual content using fewer tokens by converting text tokens into visual tokens, providing a new approach to address the high computational costs associated with processing long texts in large language models [6][11]. - In practical applications, DeepSeek-OCR surpassed GOT-OCR 2.0 using only 100 visual tokens and outperformed MinerU 2.0 with less than 800 visual tokens, demonstrating its efficiency [6][23]. Group 2: Technical Architecture - The architecture of DeepSeek-OCR consists of two main components: DeepEncoder, a visual encoder designed for high compression and high-resolution document processing, and DeepSeek3B-MoE, a lightweight mixture of experts language decoder [12][18]. - DeepEncoder employs a dual-structure design combining local and global attention to achieve high-fidelity visual understanding, significantly reducing the number of vision tokens generated from document images [14][18]. Group 3: Data and Training - DeepSeek-OCR's training process is relatively straightforward, involving independent training of DeepEncoder and the complete DeepSeek-OCR model, utilizing a large dataset for effective learning [20][21]. - The model has been trained on a diverse dataset that includes OCR 1.0 and OCR 2.0 data, general visual data, and pure text data, ensuring robust performance across various document types [25][36]. Group 4: Application and Future Directions - DeepSeek-OCR demonstrates capabilities in deep parsing, allowing it to recognize and extract structured information from various document types, including financial reports and scientific literature [24][29]. - The research team plans to further explore the integration of digital and optical text pre-training methods and evaluate the performance of optical compression in real long-text environments, indicating a promising direction for future research [39].
DeepSeek AI Returns 30% Crypto Profits in Just 3 Days Using Simple Prompts
Yahoo Finance· 2025-10-20 21:38
deepseek ai crypto. Photo by BeInCrypto Alpha Arena, a new benchmark platform set out to measure how well AI models work in live crypto markets. The test gave six leading AI models $10,000 each, access to real crypto perpetual markets, and one identical prompt — then let them trade autonomously. Within just three days, DeepSeek Chat V3.1 grew its portfolio by over 35%, outperforming both Bitcoin and every other AI trader in the field. This article explains how the experiment was structured, what prompts ...
X @Decrypt
Decrypt· 2025-10-20 20:25
AI Crypto Trading Showdown: DeepSeek and Grok Are Cashing In as Gemini Implodes► https://t.co/lWu0Pq8q2h https://t.co/lWu0Pq8q2h ...
DeepSeek开源新模型:单张A100日处理可超20万页数据
第一财经· 2025-10-20 14:58
2025.10. 20 本文字数:1556,阅读时长大约3分钟 作者 | 第一财经 刘晓洁 DeepSeek又发新模型了,这次是一个OCR 模型。10月20日,DeepSeek在Github开源了这一新模型,并发布《DeepSeek-OCR:Contexts Optical Compression》(《DeepSeek OCR:上下文光学压缩》)论文,解释了这一成果。 解码器用的是 DeepSeek-3B-MoE 架构。虽然只有 3B 参数,但采用了 MoE(混合专家)设计,64 个专家中激活 6 个,再加 2 个共享专家,实际激活参 数约 5.7 亿。这也让模型既有 30 亿参数模型的表达能力,又保持了5亿参数模型的推理效率。 实验数据显示,当文本 token 数量在视觉 token 的 10 倍以内(即压缩率小于10倍)时,模型的解码(OCR)精度可达 97%;即使在压缩率达到 20倍的 情况下,OCR 准确率仍保持在约60%。 DeepSeek 团队在论文里还提出了具有想象力的未来——用光学压缩模拟人类的遗忘机制。人类的记忆会随时间衰退,越久远的事情记得越模糊,那是否 AI也能这样?于是,团队设计将更久 ...
DeepSeek开源新模型!单张A100日处理可超20万页数据
Di Yi Cai Jing· 2025-10-20 13:23
Core Insights - DeepSeek has released a new OCR model named DeepSeek-OCR, which focuses on context optical compression to address the challenges faced by large language models in processing long texts [1][4][6] Group 1: Model Features - The DeepSeek-OCR model utilizes visual modalities to efficiently compress text information, achieving nearly 10 times lossless context compression while maintaining an OCR accuracy of over 97% [4][5] - The model consists of two main components: DeepEncoder for image feature extraction and compression, and DeepSeek3B-MoE for reconstructing text from compressed visual tokens [5] - The model's architecture allows it to have the expressive power of a 30 billion parameter model while maintaining the inference efficiency of a 500 million parameter model [5] Group 2: Potential Applications - The research indicates potential applications in long context compression and the memory forgetting mechanisms of large models, simulating human memory decay by gradually reducing the size of rendered images over time [5][6] - This approach could represent a significant breakthrough in handling ultra-long contexts, balancing theoretically infinite context information [6] Group 3: Industry Reception - The release of the DeepSeek-OCR model has garnered positive attention, receiving over 1,400 stars on GitHub shortly after its launch [7] - There are mixed opinions in the market regarding DeepSeek's pace of innovation, with some suggesting that the company is focusing on internal development for future models [8]
X @s4mmy
s4mmy· 2025-10-20 12:56
Interesting seeing a benchmark of model performance with their own trading portfolios:Curious how fine tuned models for market segments could further outperform this.https://t.co/RfIGsugSd1Jay A (@jay_azhang):DeepSeek and Grok seem to have better contextual awareness of market microstructureGrok in particular has made money in 100% of the past 5 rounds. More coming in technical writeup https://t.co/b5MQsTzZUO ...
DeepSeek又发新模型,小而美玩出新高度
Hu Xiu· 2025-10-20 12:41
Core Insights - The article discusses the challenges faced by current LLMs in processing long texts due to quadratic growth in computational complexity, which increases with longer sequences [1] - DeepSeek-OCR presents a novel approach to address this issue by utilizing "optical compression," converting text into images to reduce the number of tokens required for processing [5][34] - The model demonstrates a token compression capability of 7 to 20 times while maintaining high accuracy, showcasing its potential for efficient long-context processing [34][36] Technical Overview - DeepSeek-OCR achieves a compression rate of up to 10 times with an accuracy of over 97% [4] - The model uses a two-component architecture: DeepEncoder for image feature extraction and compression, and DeepSeek3B-MoE for reconstructing text from compressed visual tokens [16][18] - The DeepEncoder employs a clever architecture combining SAM-base and CLIP-large models, along with a convolutional compressor to significantly reduce token numbers before entering the global attention layer [10][11] Performance Metrics - OmniDocBench benchmark results indicate that a single A100-40G GPU can generate over 200,000 pages of LLM/VLM training data daily, while 20 nodes (160 A100 GPUs) can produce up to 33 million pages [7] - DeepSeek-OCR outperforms existing models, requiring only 100 visual tokens to exceed the performance of GOT-OCR2.0, which uses 256 tokens per page [15] Data Utilization - The DeepSeek team collected 30 million pages of multilingual PDF data, covering around 100 languages, with a focus on Chinese and English [21] - The data is categorized into coarse and fine annotations, with high-quality data generated through various models to enhance recognition capabilities [22] Application Potential - DeepSeek-OCR not only recognizes text but also possesses deep parsing capabilities, making it suitable for STEM applications that require structured extraction from complex images [27] - The model can extract structured data from financial reports, chemical structures, geometric figures, and generate dense captions for natural images [28] Future Directions - The team proposes exploring the concept of "optical compression" to simulate human memory decay, allowing for efficient processing of long contexts by reducing the fidelity of older information [30][31] - Future plans include conducting systematic evaluations and pre-training methods to further validate the effectiveness of this approach [35]