DeepSeek
Search documents
刚刚,DeepSeek重要突破,大模型上下文紧箍咒打破
3 6 Ke· 2025-10-20 23:22
Core Insights - DeepSeek has introduced a novel technology path in the competition of large language models by open-sourcing the DeepSeek-OCR model, which proposes the concept of "Contextual Optical Compression" for efficient information compression through text-to-image conversion [1][8]. Group 1: Model Performance and Capabilities - The feasibility of DeepSeek-OCR has been validated, achieving a decoding accuracy of 97% at a 10x compression ratio, indicating near-lossless compression, while maintaining approximately 60% accuracy at a 20x compression ratio [3][21]. - DeepSeek-OCR can express similar textual content using fewer tokens by converting text tokens into visual tokens, providing a new approach to address the high computational costs associated with processing long texts in large language models [6][11]. - In practical applications, DeepSeek-OCR surpassed GOT-OCR 2.0 using only 100 visual tokens and outperformed MinerU 2.0 with less than 800 visual tokens, demonstrating its efficiency [6][23]. Group 2: Technical Architecture - The architecture of DeepSeek-OCR consists of two main components: DeepEncoder, a visual encoder designed for high compression and high-resolution document processing, and DeepSeek3B-MoE, a lightweight mixture of experts language decoder [12][18]. - DeepEncoder employs a dual-structure design combining local and global attention to achieve high-fidelity visual understanding, significantly reducing the number of vision tokens generated from document images [14][18]. Group 3: Data and Training - DeepSeek-OCR's training process is relatively straightforward, involving independent training of DeepEncoder and the complete DeepSeek-OCR model, utilizing a large dataset for effective learning [20][21]. - The model has been trained on a diverse dataset that includes OCR 1.0 and OCR 2.0 data, general visual data, and pure text data, ensuring robust performance across various document types [25][36]. Group 4: Application and Future Directions - DeepSeek-OCR demonstrates capabilities in deep parsing, allowing it to recognize and extract structured information from various document types, including financial reports and scientific literature [24][29]. - The research team plans to further explore the integration of digital and optical text pre-training methods and evaluate the performance of optical compression in real long-text environments, indicating a promising direction for future research [39].
DeepSeek AI Returns 30% Crypto Profits in Just 3 Days Using Simple Prompts
Yahoo Finance· 2025-10-20 21:38
Core Insights - Alpha Arena conducted an experiment to evaluate the performance of six AI models in live crypto markets, with DeepSeek Chat V3.1 achieving a portfolio growth of over 35% in just three days, outperforming Bitcoin and other AI traders [1][7]. Experiment Structure - The experiment involved six leading AI models, each given $10,000 and access to real crypto perpetual markets, trading autonomously based on an identical prompt [1][2]. - The AI models were assessed on their ability to handle risk, timing, and decision-making in live markets [4]. Trading Framework - Each AI was provided with a strict trading framework that required them to trade six major cryptocurrencies (BTC, ETH, SOL, XRP, DOGE, BNB) with specific conditions for each position, including take-profit and stop-loss targets [5][9]. - The models received market data at each tick and had to decide whether to open, close, or hold positions, focusing on consistency, execution, and discipline [6]. Performance Results - After three days, the performance of the AI models was as follows: - DeepSeek Chat V3.1: $13,502.62 (+35%) - Grok 4: $13,053.28 (+30%) - Claude Sonnet 4.5: $12,737.05 (+28%) - BTC Buy & Hold: $10,393.47 (+4%) - Qwen3 Max: $9,975.10 (-0.25%) - GPT-5: $7,264.75 (-27%) - Gemini 2.5 Pro: $6,650.36 (-33%) [7]. Reasons for DeepSeek's Success - DeepSeek's strategy involved holding a diversified portfolio of all six major crypto assets with moderate leverage (10x–20x), which helped spread risk and capitalize on the altcoin rally during the experiment period [8].
X @Decrypt
Decrypt· 2025-10-20 20:25
AI Crypto Trading Showdown: DeepSeek and Grok Are Cashing In as Gemini Implodes► https://t.co/lWu0Pq8q2h https://t.co/lWu0Pq8q2h ...
DeepSeek开源新模型:单张A100日处理可超20万页数据
第一财经· 2025-10-20 14:58
Core Viewpoint - DeepSeek has released a new OCR model that utilizes visual modalities for efficient text compression, achieving significant reductions in token usage while maintaining high accuracy in text recognition [2][5][6]. Summary by Sections Model Overview - The new OCR model, named DeepSeek-OCR, was open-sourced on October 20 and is detailed in the paper "DeepSeek-OCR: Contexts Optical Compression" [2]. - The model addresses the computational challenges faced by large language models when processing lengthy text by compressing text into visual formats, achieving nearly 10 times lossless context compression while maintaining an OCR accuracy of over 97% [5][6]. Technical Specifications - The model can generate training data for large language models/visual language models at a rate of over 200,000 pages per day using a single A100-40G GPU [7]. - DeepSeek-OCR consists of two main components: DeepEncoder for image feature extraction and compression, and DeepSeek3B-MoE for reconstructing text from compressed visual tokens [7]. - The decoder employs a MoE (Mixture of Experts) design, activating 6 out of 64 experts, resulting in approximately 570 million active parameters, combining the expressive power of a 3 billion parameter model with the inference efficiency of a 500 million parameter model [7]. Experimental Results - Experimental data indicates that when the number of text tokens is within 10 times that of visual tokens (compression ratio less than 10), the model achieves an OCR accuracy of 97%. Even at a compression ratio of 20, the accuracy remains around 60% [7]. Future Directions - The team proposes a novel approach to simulate human memory decay through optical compression, gradually reducing the size of rendered images for older contexts to decrease token consumption, potentially leading to breakthroughs in handling ultra-long contexts [8]. Community Response - The release has garnered positive feedback, with over 1,400 stars on GitHub shortly after its launch, indicating strong interest in the model [9]. - The project was led by researchers with prior experience in developing advanced OCR systems, suggesting a solid foundation for the new model [9]. Market Position - There are concerns in the market regarding DeepSeek's pace of innovation, with some voices suggesting that the company may be focusing on internal development to prepare for future models [10].
DeepSeek开源新模型!单张A100日处理可超20万页数据
Di Yi Cai Jing· 2025-10-20 13:23
Core Insights - DeepSeek has released a new OCR model named DeepSeek-OCR, which focuses on context optical compression to address the challenges faced by large language models in processing long texts [1][4][6] Group 1: Model Features - The DeepSeek-OCR model utilizes visual modalities to efficiently compress text information, achieving nearly 10 times lossless context compression while maintaining an OCR accuracy of over 97% [4][5] - The model consists of two main components: DeepEncoder for image feature extraction and compression, and DeepSeek3B-MoE for reconstructing text from compressed visual tokens [5] - The model's architecture allows it to have the expressive power of a 30 billion parameter model while maintaining the inference efficiency of a 500 million parameter model [5] Group 2: Potential Applications - The research indicates potential applications in long context compression and the memory forgetting mechanisms of large models, simulating human memory decay by gradually reducing the size of rendered images over time [5][6] - This approach could represent a significant breakthrough in handling ultra-long contexts, balancing theoretically infinite context information [6] Group 3: Industry Reception - The release of the DeepSeek-OCR model has garnered positive attention, receiving over 1,400 stars on GitHub shortly after its launch [7] - There are mixed opinions in the market regarding DeepSeek's pace of innovation, with some suggesting that the company is focusing on internal development for future models [8]
X @s4mmy
s4mmy· 2025-10-20 12:56
Interesting seeing a benchmark of model performance with their own trading portfolios:Curious how fine tuned models for market segments could further outperform this.https://t.co/RfIGsugSd1Jay A (@jay_azhang):DeepSeek and Grok seem to have better contextual awareness of market microstructureGrok in particular has made money in 100% of the past 5 rounds. More coming in technical writeup https://t.co/b5MQsTzZUO ...
DeepSeek又发新模型,小而美玩出新高度
Hu Xiu· 2025-10-20 12:41
Core Insights - The article discusses the challenges faced by current LLMs in processing long texts due to quadratic growth in computational complexity, which increases with longer sequences [1] - DeepSeek-OCR presents a novel approach to address this issue by utilizing "optical compression," converting text into images to reduce the number of tokens required for processing [5][34] - The model demonstrates a token compression capability of 7 to 20 times while maintaining high accuracy, showcasing its potential for efficient long-context processing [34][36] Technical Overview - DeepSeek-OCR achieves a compression rate of up to 10 times with an accuracy of over 97% [4] - The model uses a two-component architecture: DeepEncoder for image feature extraction and compression, and DeepSeek3B-MoE for reconstructing text from compressed visual tokens [16][18] - The DeepEncoder employs a clever architecture combining SAM-base and CLIP-large models, along with a convolutional compressor to significantly reduce token numbers before entering the global attention layer [10][11] Performance Metrics - OmniDocBench benchmark results indicate that a single A100-40G GPU can generate over 200,000 pages of LLM/VLM training data daily, while 20 nodes (160 A100 GPUs) can produce up to 33 million pages [7] - DeepSeek-OCR outperforms existing models, requiring only 100 visual tokens to exceed the performance of GOT-OCR2.0, which uses 256 tokens per page [15] Data Utilization - The DeepSeek team collected 30 million pages of multilingual PDF data, covering around 100 languages, with a focus on Chinese and English [21] - The data is categorized into coarse and fine annotations, with high-quality data generated through various models to enhance recognition capabilities [22] Application Potential - DeepSeek-OCR not only recognizes text but also possesses deep parsing capabilities, making it suitable for STEM applications that require structured extraction from complex images [27] - The model can extract structured data from financial reports, chemical structures, geometric figures, and generate dense captions for natural images [28] Future Directions - The team proposes exploring the concept of "optical compression" to simulate human memory decay, allowing for efficient processing of long contexts by reducing the fidelity of older information [30][31] - Future plans include conducting systematic evaluations and pre-training methods to further validate the effectiveness of this approach [35]
DeepSeek开源新模型,用视觉方式压缩一切
Guan Cha Zhe Wang· 2025-10-20 10:47
Core Insights - DeepSeek has released a new OCR model named DeepSeek-OCR, which features 3 billion parameters and aims to enhance text recognition efficiency through optical two-dimensional mapping [1][3]. Model Architecture - The DeepSeek-OCR model consists of two main components: DeepEncoder and DeepSeek3B-MoE-A570M decoder, designed for high-resolution input and efficient compression [3][7]. - DeepEncoder combines local perception capabilities with global understanding, achieving a 16x downsampling mechanism that retains 97% of key information [7]. Performance Metrics - The model achieves a decoding accuracy of 97% when the text token count is within 10 times the visual token count, and maintains approximately 60% accuracy at a compression rate of 20x [3]. - In benchmark tests, DeepSeek-OCR outperformed GOT-OCR2.0 and MinerU2.0 using significantly fewer visual tokens [4]. Practical Applications - DeepSeek-OCR can generate over 200,000 pages of LLM/VLM training data daily on a single A100-40G GPU, indicating its high operational efficiency [4][7]. - The model has potential applications in various sectors, including finance for digitizing financial reports, healthcare for archiving medical records, and publishing for digitizing ancient texts [17].
太强了!DeepSeek刚刚开源新模型,用视觉方式压缩一切
机器之心· 2025-10-20 09:15
Core Insights - DeepSeek has released a new OCR model, DeepSeek-OCR, which demonstrates the potential for nearly 10x lossless contextual compression through text-to-image methods [1][3] - The model has a parameter count of 3 billion and has already seen over 100 downloads shortly after its release [1] - The research team behind DeepSeek-OCR includes Haoran Wei, Yaofeng Sun, and Yukun Li, with Wei having previously developed the GOT-OCR2.0 system [1] Model Architecture - DeepSeek-OCR consists of two main components: DeepEncoder and DeepSeek3B-MoE-A570M decoder [3][10] - DeepEncoder is designed to maintain low activation states under high-resolution inputs while achieving high compression ratios, generating a moderate number of visual tokens [3][14] - The model achieves an OCR accuracy of 97% when the number of text tokens is within 10 times the number of visual tokens, and maintains about 60% accuracy at a compression ratio of 20x [3][28] Performance and Practical Applications - In the OmniDocBench benchmark, DeepSeek-OCR outperformed GOT-OCR2.0 using only 100 visual tokens compared to 256 tokens for GOT-OCR2.0 [5] - The model can generate over 200,000 pages of LLM/VLM training data daily on a single A100-40G GPU [5] - DeepSeek-OCR shows strong practical capabilities, achieving superior performance compared to existing models like MinerU2.0 while using significantly fewer visual tokens [30][32] Training and Data - The training process for DeepSeek-OCR involves two main phases, utilizing a variety of OCR datasets and general visual data [21][24] - The model was trained using 20 nodes, each equipped with 8 A100-40G GPUs, achieving a global batch size of 640 [25] - The training speed reached 90 billion tokens per day for pure text data and 70 billion tokens per day for multimodal data [25] Compression and Recognition Capabilities - DeepSeek-OCR's method of using visual modalities as efficient compression media allows for significantly higher compression rates compared to traditional text representations [9][10] - The model supports recognition of nearly 100 languages, showcasing its versatility in processing diverse document types [42] - It can effectively parse complex layouts and extract structured data from charts, which is crucial for financial and scientific documents [35][40]
X @CZ 🔶 BNB
CZ 🔶 BNB· 2025-10-20 05:54
AI Trading Performance - DeepSeek 在 AI 交易领域表现突出,超越其他竞争者 [1][2] - 市场对 DeepSeek 的表现感知强烈,认为 AI 与加密货币找到了正确的结合方式 [2] Trading Strategy & Market Impact - 独特的交易策略是成功的关键,避免与他人同时买卖 [1] - 如果足够多的人使用相同的 AI,其购买力可能会推动价格上涨 [1] - DeepSeek 的每个策略都有止盈止损点位或条件 [2] Future Trend - 预计会有更多人研究 AI 交易,交易量将会增加 [2]