DeepSeek
Search documents
全新开源的DeepSeek-OCR,可能是最近最惊喜的模型。
数字生命卡兹克· 2025-10-21 01:32
Core Insights - The article discusses the introduction of DeepSeek-OCR, a new model that enhances traditional Optical Character Recognition (OCR) capabilities by not only extracting text but also generating structured documents and compressing information effectively [1][3][5]. Group 1: Traditional OCR vs. DeepSeek-OCR - Traditional OCR primarily converts images of text into editable digital text, which can be cumbersome for complex documents like financial reports [3][5]. - DeepSeek-OCR goes beyond traditional OCR by generating Markdown documents that maintain the structure of the original content, including text, titles, and charts, making it more versatile [5][6]. Group 2: Contextual Compression - DeepSeek-OCR introduces a novel approach called "Contextual Optical Compression," which allows the model to process long texts more efficiently by converting them into image files instead of tokenized text [18][19]. - This method significantly reduces the computational load associated with processing long texts, as the complexity of token processing increases quadratically with text length [8][10][11]. Group 3: Performance Metrics - The model achieves a remarkable compression ratio of up to 10 times while maintaining a recognition accuracy of 96.5% [23]. - The compression ratio is calculated by dividing the total number of original text tokens by the number of visual tokens after compression [24]. Group 4: Implications for AI and Memory - The article suggests that DeepSeek-OCR's approach mirrors human memory, where recent information is retained with high fidelity while older information gradually fades [39][40]. - This mechanism of "forgetting" is presented as a potential advantage for AI, allowing it to prioritize important information and manage memory more like humans do [40][41].
DeepSeek新模型被硅谷夸疯了!用二维视觉压缩一维文字,单GPU能跑,“谷歌核心机密被开源”
Hua Er Jie Jian Wen· 2025-10-21 00:27
Core Insights - DeepSeek has released an open-source model named DeepSeek-OCR, which is gaining significant attention in Silicon Valley for its innovative approach to processing long texts using visual compression techniques [1][4][21] - The model is designed to tackle the computational challenges associated with large models handling lengthy text, achieving high accuracy rates even with reduced token usage [1][4][5] Model Performance - DeepSeek-OCR operates with a model size of 3 billion parameters and demonstrates a remarkable ability to decode text with high accuracy, achieving 97% accuracy with a compression ratio of less than 10 times and maintaining 60% accuracy even at a 20 times compression ratio [1][4][5] - The model has been benchmarked against existing models, showing superior performance with significantly fewer visual tokens, such as using only 100 visual tokens to outperform models that require 256 tokens [7][8] Data Generation Efficiency - The model can generate over 200,000 pages of high-quality training data daily using a single A100-40G GPU, showcasing its efficiency in data generation [2][4] Innovative Approach - DeepSeek introduces a concept called "Contextual Optical Compression," which compresses textual information into visual formats, allowing the model to interpret content through images rather than text [4][10] - The architecture includes two main components: the DeepEncoder for converting images into compressed visual tokens and the DeepSeek3B-MoE-A570M for reconstructing text from these tokens [10][11] Flexibility and Adaptability - The DeepEncoder is designed to handle various input resolutions and token counts, allowing it to adapt to different compression needs and application scenarios [11][12] - The model supports complex image analyses, including financial reports and scientific diagrams, enhancing its applicability across diverse fields [12][14] Future Implications - The research suggests that this unified approach to visual and textual processing could be a step towards achieving Artificial General Intelligence (AGI) [4][21] - The team behind DeepSeek-OCR is exploring the potential of simulating human memory mechanisms through optical compression, which could lead to more efficient handling of long-term contexts in AI [20][21]
DeepSeek新模型被硅谷夸疯了!用二维视觉压缩一维文字,单GPU能跑,“谷歌核心机密被开源”
量子位· 2025-10-20 23:34
Core Insights - DeepSeek has released a groundbreaking open-source model named DeepSeek-OCR, which is gaining significant attention in Silicon Valley for its innovative approach to processing long texts with high efficiency [1][3][7]. Model Overview - The DeepSeek-OCR model addresses the computational challenges associated with large models handling long texts by utilizing a method that compresses textual information into visual tokens, thereby reducing the number of tokens needed for processing [5][12][13]. - The model achieves high accuracy rates, with a decoding accuracy of 97% when the compression ratio is less than 10 times and around 60% even at a 20 times compression ratio [6]. Performance Metrics - DeepSeek-OCR has demonstrated superior performance on the OmniDocBench benchmark, achieving state-of-the-art (SOTA) results with significantly fewer visual tokens compared to existing models [14][15]. - For instance, using only 100 visual tokens, DeepSeek-OCR outperforms the GOT-OCR2.0 model, which uses 256 tokens, and matches the performance of other models while using far fewer tokens [17]. Technical Components - The architecture of DeepSeek-OCR consists of two main components: the DeepEncoder, which converts high-resolution images into highly compressed visual tokens, and the DeepSeek3B-MoE-A570M decoder, which reconstructs text from these tokens [20][22]. - The model supports various input modes, allowing it to adapt its compression strength based on the specific task requirements [24]. Innovative Concepts - The research introduces the concept of "Contextual Optical Compression," which simulates human memory mechanisms by dynamically allocating computational resources based on the temporal context of the information being processed [36][38]. - This approach aims to enhance the model's ability to handle long conversations or documents, potentially leading to a more human-like memory structure in AI systems [39][41].
刚刚,DeepSeek重要突破,大模型上下文紧箍咒打破
3 6 Ke· 2025-10-20 23:22
Core Insights - DeepSeek has introduced a novel technology path in the competition of large language models by open-sourcing the DeepSeek-OCR model, which proposes the concept of "Contextual Optical Compression" for efficient information compression through text-to-image conversion [1][8]. Group 1: Model Performance and Capabilities - The feasibility of DeepSeek-OCR has been validated, achieving a decoding accuracy of 97% at a 10x compression ratio, indicating near-lossless compression, while maintaining approximately 60% accuracy at a 20x compression ratio [3][21]. - DeepSeek-OCR can express similar textual content using fewer tokens by converting text tokens into visual tokens, providing a new approach to address the high computational costs associated with processing long texts in large language models [6][11]. - In practical applications, DeepSeek-OCR surpassed GOT-OCR 2.0 using only 100 visual tokens and outperformed MinerU 2.0 with less than 800 visual tokens, demonstrating its efficiency [6][23]. Group 2: Technical Architecture - The architecture of DeepSeek-OCR consists of two main components: DeepEncoder, a visual encoder designed for high compression and high-resolution document processing, and DeepSeek3B-MoE, a lightweight mixture of experts language decoder [12][18]. - DeepEncoder employs a dual-structure design combining local and global attention to achieve high-fidelity visual understanding, significantly reducing the number of vision tokens generated from document images [14][18]. Group 3: Data and Training - DeepSeek-OCR's training process is relatively straightforward, involving independent training of DeepEncoder and the complete DeepSeek-OCR model, utilizing a large dataset for effective learning [20][21]. - The model has been trained on a diverse dataset that includes OCR 1.0 and OCR 2.0 data, general visual data, and pure text data, ensuring robust performance across various document types [25][36]. Group 4: Application and Future Directions - DeepSeek-OCR demonstrates capabilities in deep parsing, allowing it to recognize and extract structured information from various document types, including financial reports and scientific literature [24][29]. - The research team plans to further explore the integration of digital and optical text pre-training methods and evaluate the performance of optical compression in real long-text environments, indicating a promising direction for future research [39].
DeepSeek AI Returns 30% Crypto Profits in Just 3 Days Using Simple Prompts
Yahoo Finance· 2025-10-20 21:38
Core Insights - Alpha Arena conducted an experiment to evaluate the performance of six AI models in live crypto markets, with DeepSeek Chat V3.1 achieving a portfolio growth of over 35% in just three days, outperforming Bitcoin and other AI traders [1][7]. Experiment Structure - The experiment involved six leading AI models, each given $10,000 and access to real crypto perpetual markets, trading autonomously based on an identical prompt [1][2]. - The AI models were assessed on their ability to handle risk, timing, and decision-making in live markets [4]. Trading Framework - Each AI was provided with a strict trading framework that required them to trade six major cryptocurrencies (BTC, ETH, SOL, XRP, DOGE, BNB) with specific conditions for each position, including take-profit and stop-loss targets [5][9]. - The models received market data at each tick and had to decide whether to open, close, or hold positions, focusing on consistency, execution, and discipline [6]. Performance Results - After three days, the performance of the AI models was as follows: - DeepSeek Chat V3.1: $13,502.62 (+35%) - Grok 4: $13,053.28 (+30%) - Claude Sonnet 4.5: $12,737.05 (+28%) - BTC Buy & Hold: $10,393.47 (+4%) - Qwen3 Max: $9,975.10 (-0.25%) - GPT-5: $7,264.75 (-27%) - Gemini 2.5 Pro: $6,650.36 (-33%) [7]. Reasons for DeepSeek's Success - DeepSeek's strategy involved holding a diversified portfolio of all six major crypto assets with moderate leverage (10x–20x), which helped spread risk and capitalize on the altcoin rally during the experiment period [8].
X @Decrypt
Decrypt· 2025-10-20 20:25
AI Crypto Trading Showdown: DeepSeek and Grok Are Cashing In as Gemini Implodes► https://t.co/lWu0Pq8q2h https://t.co/lWu0Pq8q2h ...
DeepSeek开源新模型:单张A100日处理可超20万页数据
第一财经· 2025-10-20 14:58
Core Viewpoint - DeepSeek has released a new OCR model that utilizes visual modalities for efficient text compression, achieving significant reductions in token usage while maintaining high accuracy in text recognition [2][5][6]. Summary by Sections Model Overview - The new OCR model, named DeepSeek-OCR, was open-sourced on October 20 and is detailed in the paper "DeepSeek-OCR: Contexts Optical Compression" [2]. - The model addresses the computational challenges faced by large language models when processing lengthy text by compressing text into visual formats, achieving nearly 10 times lossless context compression while maintaining an OCR accuracy of over 97% [5][6]. Technical Specifications - The model can generate training data for large language models/visual language models at a rate of over 200,000 pages per day using a single A100-40G GPU [7]. - DeepSeek-OCR consists of two main components: DeepEncoder for image feature extraction and compression, and DeepSeek3B-MoE for reconstructing text from compressed visual tokens [7]. - The decoder employs a MoE (Mixture of Experts) design, activating 6 out of 64 experts, resulting in approximately 570 million active parameters, combining the expressive power of a 3 billion parameter model with the inference efficiency of a 500 million parameter model [7]. Experimental Results - Experimental data indicates that when the number of text tokens is within 10 times that of visual tokens (compression ratio less than 10), the model achieves an OCR accuracy of 97%. Even at a compression ratio of 20, the accuracy remains around 60% [7]. Future Directions - The team proposes a novel approach to simulate human memory decay through optical compression, gradually reducing the size of rendered images for older contexts to decrease token consumption, potentially leading to breakthroughs in handling ultra-long contexts [8]. Community Response - The release has garnered positive feedback, with over 1,400 stars on GitHub shortly after its launch, indicating strong interest in the model [9]. - The project was led by researchers with prior experience in developing advanced OCR systems, suggesting a solid foundation for the new model [9]. Market Position - There are concerns in the market regarding DeepSeek's pace of innovation, with some voices suggesting that the company may be focusing on internal development to prepare for future models [10].
DeepSeek开源新模型!单张A100日处理可超20万页数据
Di Yi Cai Jing· 2025-10-20 13:23
Core Insights - DeepSeek has released a new OCR model named DeepSeek-OCR, which focuses on context optical compression to address the challenges faced by large language models in processing long texts [1][4][6] Group 1: Model Features - The DeepSeek-OCR model utilizes visual modalities to efficiently compress text information, achieving nearly 10 times lossless context compression while maintaining an OCR accuracy of over 97% [4][5] - The model consists of two main components: DeepEncoder for image feature extraction and compression, and DeepSeek3B-MoE for reconstructing text from compressed visual tokens [5] - The model's architecture allows it to have the expressive power of a 30 billion parameter model while maintaining the inference efficiency of a 500 million parameter model [5] Group 2: Potential Applications - The research indicates potential applications in long context compression and the memory forgetting mechanisms of large models, simulating human memory decay by gradually reducing the size of rendered images over time [5][6] - This approach could represent a significant breakthrough in handling ultra-long contexts, balancing theoretically infinite context information [6] Group 3: Industry Reception - The release of the DeepSeek-OCR model has garnered positive attention, receiving over 1,400 stars on GitHub shortly after its launch [7] - There are mixed opinions in the market regarding DeepSeek's pace of innovation, with some suggesting that the company is focusing on internal development for future models [8]
X @s4mmy
s4mmy· 2025-10-20 12:56
Model Performance & Market Awareness - DeepSeek and Grok demonstrate superior contextual awareness of market microstructure [1] - Fine-tuned models for specific market segments could potentially outperform existing benchmarks [1] Investment Performance - Grok has reportedly generated profit in 100% of its last 5 rounds [1]