Seek .(SKLTY)
Search documents
重磅,DeepSeek再开源:视觉即压缩,100个token干翻7000个
3 6 Ke· 2025-10-21 01:35
Core Insights - DeepSeek-OCR model explores the boundaries of visual-text compression, demonstrating the ability to decode over ten times the amount of text information from a limited number of visual tokens, thus addressing long-context issues in large language models (LLMs) [1][16]. Group 1: Model Performance - In the OmniDocBench benchmark, DeepSeek-OCR surpasses GOT-OCR2.0 by using only 100 visual tokens compared to GOT-OCR2.0's 256 tokens per page [2][44]. - The model shows practical value in OCR tasks, achieving a compression ratio of 10.5× to 19.7× while maintaining high decoding accuracy, with precision rates around 97% within a 10× compression ratio [37][41]. Group 2: Technical Architecture - DeepSeek-OCR employs an end-to-end visual language model (VLM) architecture consisting of an encoder (DeepEncoder) and a decoder (DeepSeek-3B-MoE) [21][34]. - The encoder, DeepEncoder, utilizes a novel architecture with approximately 380 million parameters, combining SAM-base and CLIP-large for feature extraction and tokenization [23][24]. Group 3: Compression Capabilities - The model can achieve a compression ratio of 7 to 20 times in different historical contexts, providing a feasible direction for addressing long-context issues in LLMs [16]. - DeepSeek-OCR can generate training data for LLMs/VLMs at a rate of 33 million pages daily using 20 computing nodes, each equipped with 8 A100-40G GPUs [39]. Group 4: Multilingual and Application Scope - DeepSeek-OCR is capable of processing nearly 100 languages, enhancing its applicability in global contexts [43]. - The model can also interpret charts, chemical equations, simple geometric shapes, and natural images, showcasing its versatility in various document types [43][44].
DeepSeek新模型被硅谷夸疯了!用二维视觉压缩一维文字,单GPU能跑,“谷歌核心机密被开源”
Hua Er Jie Jian Wen· 2025-10-21 00:27
Core Insights - DeepSeek has released an open-source model named DeepSeek-OCR, which is gaining significant attention in Silicon Valley for its innovative approach to processing long texts using visual compression techniques [1][4][21] - The model is designed to tackle the computational challenges associated with large models handling lengthy text, achieving high accuracy rates even with reduced token usage [1][4][5] Model Performance - DeepSeek-OCR operates with a model size of 3 billion parameters and demonstrates a remarkable ability to decode text with high accuracy, achieving 97% accuracy with a compression ratio of less than 10 times and maintaining 60% accuracy even at a 20 times compression ratio [1][4][5] - The model has been benchmarked against existing models, showing superior performance with significantly fewer visual tokens, such as using only 100 visual tokens to outperform models that require 256 tokens [7][8] Data Generation Efficiency - The model can generate over 200,000 pages of high-quality training data daily using a single A100-40G GPU, showcasing its efficiency in data generation [2][4] Innovative Approach - DeepSeek introduces a concept called "Contextual Optical Compression," which compresses textual information into visual formats, allowing the model to interpret content through images rather than text [4][10] - The architecture includes two main components: the DeepEncoder for converting images into compressed visual tokens and the DeepSeek3B-MoE-A570M for reconstructing text from these tokens [10][11] Flexibility and Adaptability - The DeepEncoder is designed to handle various input resolutions and token counts, allowing it to adapt to different compression needs and application scenarios [11][12] - The model supports complex image analyses, including financial reports and scientific diagrams, enhancing its applicability across diverse fields [12][14] Future Implications - The research suggests that this unified approach to visual and textual processing could be a step towards achieving Artificial General Intelligence (AGI) [4][21] - The team behind DeepSeek-OCR is exploring the potential of simulating human memory mechanisms through optical compression, which could lead to more efficient handling of long-term contexts in AI [20][21]
10月21日早餐 | 苹果创历史新高;DeepSeek发布新论文
Xuan Gu Bao· 2025-10-21 00:04
Group 1: Market Overview - Trade tensions have eased, leading to a rise in US stock markets, with the S&P 500 up 1.07%, Dow Jones up 1.12%, and Nasdaq up 1.37% [1] - Apple shares increased nearly 4%, reaching a historical high for the first time this year, while Nvidia fell 0.3% [1] - Key mineral stocks surged following a cooperation agreement between the US and Australia, with NioCorp Developments rising nearly 20% and USA Rare Earth up nearly 14% [1] Group 2: Chinese Market Performance - The Chinese concept stocks index rose over 2%, with Alibaba up nearly 4% and Kuke Music surging 49% [2] Group 3: Commodity and Bond Market - Gold prices rebounded, with futures nearing $4,400, up over 4%, while silver also saw a rise of over 3% [3] - Crude oil prices fell but did not break away from five-month lows, with US oil dropping over 2% before recovering most losses [4] - The yield on ten-year US Treasury bonds decreased, approaching a six-month low, while the US dollar index rose for two consecutive days [4] Group 4: Technology and Innovation - Amazon's AWS services experienced a large-scale outage affecting thousands of users, including Coinbase and Robinhood [5] - DeepSeek introduced a new OCR model that significantly reduces computational and storage costs while maintaining high accuracy [10] Group 5: Industry Developments - The Ministry of Industry and Information Technology held a meeting on the cement industry, emphasizing the need to balance supply and demand and to eliminate outdated capacity [11] - Hubei Province launched a carbon trading platform, achieving a cumulative transaction volume exceeding 10 billion yuan, enhancing efficiency in resource allocation [11] Group 6: Corporate Earnings - Ningde Times reported a net profit of 18.55 billion yuan for Q3, a year-on-year increase of 41.21%, and a total net profit of 49.03 billion yuan for the first three quarters, up 36.20% [16] - Yonghe shares reported a Q3 net profit of 198 million yuan, up 485.77%, and a total of 470 million yuan for the first three quarters, up 220.39% [16] - China Shipbuilding expects a net profit between 5.55 billion and 6.15 billion yuan for the first three quarters, a year-on-year increase of 104.30% to 126.39% [16]
DeepSeek上线论文,用OCR技术减少计算和存储开销
Xuan Gu Bao· 2025-10-20 23:31
Core Insights - DeepSeek published a paper titled "DeepSeek-OCR: Contexts Optical Compression," highlighting a method to compress text information by rendering long text into images, significantly reducing computational and storage costs [1] - The experimental results showed that at a compression ratio of 10 times, the OCR accuracy reached 97%, and at 20 times compression, it maintained 60% accuracy, indicating the model's effectiveness in handling long documents [1] - The integration of OCR technology with artificial intelligence has become a new trend, with deep learning and neural networks significantly enhancing recognition accuracy in complex scenarios [1] Industry Overview - The global AI-driven OCR market is projected to reach approximately 8.17 billion yuan in 2024 and nearly 13.69 billion yuan by 2031, indicating substantial growth potential [2] - Companies like Hehe Information have established benchmark products in the industry, with an average character recognition rate of 81.9% in complex scenarios, outperforming competitors such as Baidu (70.0%), Tencent (65.0%), and Alibaba (66.9%) [2] - Hanwang Technology has notable advantages in OCR technology, having received a national science and technology progress award, particularly in handwriting recognition and complex scene recognition [2]
刚刚,DeepSeek重要突破,大模型上下文紧箍咒打破
3 6 Ke· 2025-10-20 23:22
Core Insights - DeepSeek has introduced a novel technology path in the competition of large language models by open-sourcing the DeepSeek-OCR model, which proposes the concept of "Contextual Optical Compression" for efficient information compression through text-to-image conversion [1][8]. Group 1: Model Performance and Capabilities - The feasibility of DeepSeek-OCR has been validated, achieving a decoding accuracy of 97% at a 10x compression ratio, indicating near-lossless compression, while maintaining approximately 60% accuracy at a 20x compression ratio [3][21]. - DeepSeek-OCR can express similar textual content using fewer tokens by converting text tokens into visual tokens, providing a new approach to address the high computational costs associated with processing long texts in large language models [6][11]. - In practical applications, DeepSeek-OCR surpassed GOT-OCR 2.0 using only 100 visual tokens and outperformed MinerU 2.0 with less than 800 visual tokens, demonstrating its efficiency [6][23]. Group 2: Technical Architecture - The architecture of DeepSeek-OCR consists of two main components: DeepEncoder, a visual encoder designed for high compression and high-resolution document processing, and DeepSeek3B-MoE, a lightweight mixture of experts language decoder [12][18]. - DeepEncoder employs a dual-structure design combining local and global attention to achieve high-fidelity visual understanding, significantly reducing the number of vision tokens generated from document images [14][18]. Group 3: Data and Training - DeepSeek-OCR's training process is relatively straightforward, involving independent training of DeepEncoder and the complete DeepSeek-OCR model, utilizing a large dataset for effective learning [20][21]. - The model has been trained on a diverse dataset that includes OCR 1.0 and OCR 2.0 data, general visual data, and pure text data, ensuring robust performance across various document types [25][36]. Group 4: Application and Future Directions - DeepSeek-OCR demonstrates capabilities in deep parsing, allowing it to recognize and extract structured information from various document types, including financial reports and scientific literature [24][29]. - The research team plans to further explore the integration of digital and optical text pre-training methods and evaluate the performance of optical compression in real long-text environments, indicating a promising direction for future research [39].
建湖农商银行上线DeepSeek智能助手
Jiang Nan Shi Bao· 2025-10-20 23:15
Core Insights - Jianhu Rural Commercial Bank has successfully developed and launched the "DeepSeek Intelligent Assistant," a smart data model that covers comprehensive business knowledge and regulatory documents, enhancing the bank's digital transformation and intelligent support capabilities [1] Group 1: Intelligent Data Model Development - The "DeepSeek Intelligent Assistant" utilizes localized deployment of the DeepSeek-R1 model, along with technologies such as Ollama, Dify, and Text Embedding, ensuring data security while providing robust support across various banking functions [1] - The model aims to facilitate high-quality development towards digitalization and intelligence within the bank [1] Group 2: Credit Assistance - In the area of credit assistance, the "DeepSeek Intelligent Assistant" integrates credit policies and documentation, leveraging knowledge graphs to instantly understand and accurately respond to various credit-related inquiries [1] - This capability enhances the efficiency of customer managers by precisely locating the source of answers [1] Group 3: Compliance Management - For compliance management, the "DeepSeek Intelligent Assistant" has established a compliance knowledge base by incorporating all institutional regulations, enabling the generation of compliance question banks through the intelligent assistant [1] - This transition from manual question generation to intelligent generation promotes employee learning and adherence to compliance standards [1]
DeepSeek开源新模型!单张A100日处理可超20万页数据
Di Yi Cai Jing· 2025-10-20 13:23
Core Insights - DeepSeek has released a new OCR model named DeepSeek-OCR, which focuses on context optical compression to address the challenges faced by large language models in processing long texts [1][4][6] Group 1: Model Features - The DeepSeek-OCR model utilizes visual modalities to efficiently compress text information, achieving nearly 10 times lossless context compression while maintaining an OCR accuracy of over 97% [4][5] - The model consists of two main components: DeepEncoder for image feature extraction and compression, and DeepSeek3B-MoE for reconstructing text from compressed visual tokens [5] - The model's architecture allows it to have the expressive power of a 30 billion parameter model while maintaining the inference efficiency of a 500 million parameter model [5] Group 2: Potential Applications - The research indicates potential applications in long context compression and the memory forgetting mechanisms of large models, simulating human memory decay by gradually reducing the size of rendered images over time [5][6] - This approach could represent a significant breakthrough in handling ultra-long contexts, balancing theoretically infinite context information [6] Group 3: Industry Reception - The release of the DeepSeek-OCR model has garnered positive attention, receiving over 1,400 stars on GitHub shortly after its launch [7] - There are mixed opinions in the market regarding DeepSeek's pace of innovation, with some suggesting that the company is focusing on internal development for future models [8]
DeepSeek又发新模型,小而美玩出新高度
Hu Xiu· 2025-10-20 12:41
Core Insights - The article discusses the challenges faced by current LLMs in processing long texts due to quadratic growth in computational complexity, which increases with longer sequences [1] - DeepSeek-OCR presents a novel approach to address this issue by utilizing "optical compression," converting text into images to reduce the number of tokens required for processing [5][34] - The model demonstrates a token compression capability of 7 to 20 times while maintaining high accuracy, showcasing its potential for efficient long-context processing [34][36] Technical Overview - DeepSeek-OCR achieves a compression rate of up to 10 times with an accuracy of over 97% [4] - The model uses a two-component architecture: DeepEncoder for image feature extraction and compression, and DeepSeek3B-MoE for reconstructing text from compressed visual tokens [16][18] - The DeepEncoder employs a clever architecture combining SAM-base and CLIP-large models, along with a convolutional compressor to significantly reduce token numbers before entering the global attention layer [10][11] Performance Metrics - OmniDocBench benchmark results indicate that a single A100-40G GPU can generate over 200,000 pages of LLM/VLM training data daily, while 20 nodes (160 A100 GPUs) can produce up to 33 million pages [7] - DeepSeek-OCR outperforms existing models, requiring only 100 visual tokens to exceed the performance of GOT-OCR2.0, which uses 256 tokens per page [15] Data Utilization - The DeepSeek team collected 30 million pages of multilingual PDF data, covering around 100 languages, with a focus on Chinese and English [21] - The data is categorized into coarse and fine annotations, with high-quality data generated through various models to enhance recognition capabilities [22] Application Potential - DeepSeek-OCR not only recognizes text but also possesses deep parsing capabilities, making it suitable for STEM applications that require structured extraction from complex images [27] - The model can extract structured data from financial reports, chemical structures, geometric figures, and generate dense captions for natural images [28] Future Directions - The team proposes exploring the concept of "optical compression" to simulate human memory decay, allowing for efficient processing of long contexts by reducing the fidelity of older information [30][31] - Future plans include conducting systematic evaluations and pre-training methods to further validate the effectiveness of this approach [35]
突破新领域,深度求索发布文字识别模型DeepSeek-OCR
Bei Ke Cai Jing· 2025-10-20 12:37
Core Insights - DeepSeek has released a new model called DeepSeek-OCR on the open-source community platform Hugging Face, which is designed for Optical Character Recognition (OCR) to extract text from images [1][3] Group 1: Model Description - DeepSeek-OCR is described in a related paper as a preliminary study on the feasibility of compressing long contexts through optical two-dimensional mapping [3] - The model achieves a decoding (OCR) accuracy of 97% when the number of text tokens is within 10 times the number of visual tokens (compression ratio < 10) [3] - Even at a compression ratio of 20, the OCR accuracy remains around 60%, indicating significant potential for research areas such as long context compression and memory forgetting mechanisms in large language models [3]
六大AI拿1万美元真实交易:DeepSeek最能赚,GPT-5亏麻了
Hu Xiu· 2025-10-20 11:49
Core Insights - Jay Chou's recent troubles involve a Bitcoin account managed by his magician friend, Cai Weize, who claimed the account was locked a year ago, resulting in a loss of funds [1][2] - The article discusses the emergence of AI models competing in the cryptocurrency market, highlighting a competition called Alpha Arena where six top AI models are trading cryptocurrencies [3][4] Group 1: AI Competition Overview - The competition involves six AI models, each given $10,000 to trade perpetual contracts on the Hyperliquid platform, with trading pairs including BTC, ETH, BNB, SOL, XRP, and DOGE [4][6] - The performance of these AI models is measured by risk-adjusted returns, focusing not only on profits but also on the risks taken [6][7] Group 2: Performance of AI Models - As of the latest update, DeepSeek Chat V3.1 leads with an account value of $14,310 and a return of +43.1%, showcasing a strategy of high leverage and concentrated positions [11][12] - Grok 4 follows with an account value of $13,921 and a return of +39.21%, employing a high-leverage long-only strategy [12][21] - Claude Sonnet 4.5 has an account value of $12,528 and a return of +25.28%, focusing on a conservative trading approach [12][23] - In contrast, GPT-5 and Gemini 2.5 Pro are underperforming, with returns of -24.78% and -27.74% respectively, indicating poor trading strategies and high transaction costs [12][30] Group 3: AI's Role in Investment - The article emphasizes that AI's greatest value in investment may lie in transparency, allowing investors to see trading records and decision-making processes, unlike human-managed accounts [40][41] - The ambition behind the AI competition is to use financial markets as a training ground for AI, aiming for continuous learning and adaptation to market dynamics [34][35]