Workflow
Seek .(SKLTY)
icon
Search documents
DeepSeek上线论文,用OCR技术减少计算和存储开销
Xuan Gu Bao· 2025-10-20 23:31
Core Insights - DeepSeek published a paper titled "DeepSeek-OCR: Contexts Optical Compression," highlighting a method to compress text information by rendering long text into images, significantly reducing computational and storage costs [1] - The experimental results showed that at a compression ratio of 10 times, the OCR accuracy reached 97%, and at 20 times compression, it maintained 60% accuracy, indicating the model's effectiveness in handling long documents [1] - The integration of OCR technology with artificial intelligence has become a new trend, with deep learning and neural networks significantly enhancing recognition accuracy in complex scenarios [1] Industry Overview - The global AI-driven OCR market is projected to reach approximately 8.17 billion yuan in 2024 and nearly 13.69 billion yuan by 2031, indicating substantial growth potential [2] - Companies like Hehe Information have established benchmark products in the industry, with an average character recognition rate of 81.9% in complex scenarios, outperforming competitors such as Baidu (70.0%), Tencent (65.0%), and Alibaba (66.9%) [2] - Hanwang Technology has notable advantages in OCR technology, having received a national science and technology progress award, particularly in handwriting recognition and complex scene recognition [2]
刚刚,DeepSeek重要突破,大模型上下文紧箍咒打破
3 6 Ke· 2025-10-20 23:22
Core Insights - DeepSeek has introduced a novel technology path in the competition of large language models by open-sourcing the DeepSeek-OCR model, which proposes the concept of "Contextual Optical Compression" for efficient information compression through text-to-image conversion [1][8]. Group 1: Model Performance and Capabilities - The feasibility of DeepSeek-OCR has been validated, achieving a decoding accuracy of 97% at a 10x compression ratio, indicating near-lossless compression, while maintaining approximately 60% accuracy at a 20x compression ratio [3][21]. - DeepSeek-OCR can express similar textual content using fewer tokens by converting text tokens into visual tokens, providing a new approach to address the high computational costs associated with processing long texts in large language models [6][11]. - In practical applications, DeepSeek-OCR surpassed GOT-OCR 2.0 using only 100 visual tokens and outperformed MinerU 2.0 with less than 800 visual tokens, demonstrating its efficiency [6][23]. Group 2: Technical Architecture - The architecture of DeepSeek-OCR consists of two main components: DeepEncoder, a visual encoder designed for high compression and high-resolution document processing, and DeepSeek3B-MoE, a lightweight mixture of experts language decoder [12][18]. - DeepEncoder employs a dual-structure design combining local and global attention to achieve high-fidelity visual understanding, significantly reducing the number of vision tokens generated from document images [14][18]. Group 3: Data and Training - DeepSeek-OCR's training process is relatively straightforward, involving independent training of DeepEncoder and the complete DeepSeek-OCR model, utilizing a large dataset for effective learning [20][21]. - The model has been trained on a diverse dataset that includes OCR 1.0 and OCR 2.0 data, general visual data, and pure text data, ensuring robust performance across various document types [25][36]. Group 4: Application and Future Directions - DeepSeek-OCR demonstrates capabilities in deep parsing, allowing it to recognize and extract structured information from various document types, including financial reports and scientific literature [24][29]. - The research team plans to further explore the integration of digital and optical text pre-training methods and evaluate the performance of optical compression in real long-text environments, indicating a promising direction for future research [39].
建湖农商银行上线DeepSeek智能助手
Jiang Nan Shi Bao· 2025-10-20 23:15
在信贷辅助方面,"DeepSeek智能助手"通过导入信贷制度、信贷资料等,借助知识图谱的强大支撑,能 够瞬间理解并准确回答各种信贷问题,精准定位到回答来源,提高了客户经理的工作效率。 在合规管理方面,"DeepSeek智能助手"构建了合规制度知识库,将全行制度导入知识库,通过智能助手 生成合规题库,促进员工学习,实现了从人工出题到智能生成的转变。 近日,建湖农商银行通过本地化部署DeepSeek-R1模型,结合Ollama、Dify和Text Embedding等技术,在 保证数据安全的前提下,成功开发上线了覆盖全行业务知识、制度文件等内容的智能数据大模型 ——"DeepSeek智能助手",为客户经理、柜面人员以及合规管理等各个环节提供了强有力的智能化支 持,推动全行向智能化、数字化的高质量发展迈进。 ...
DeepSeek开源新模型!单张A100日处理可超20万页数据
Di Yi Cai Jing· 2025-10-20 13:23
Core Insights - DeepSeek has released a new OCR model named DeepSeek-OCR, which focuses on context optical compression to address the challenges faced by large language models in processing long texts [1][4][6] Group 1: Model Features - The DeepSeek-OCR model utilizes visual modalities to efficiently compress text information, achieving nearly 10 times lossless context compression while maintaining an OCR accuracy of over 97% [4][5] - The model consists of two main components: DeepEncoder for image feature extraction and compression, and DeepSeek3B-MoE for reconstructing text from compressed visual tokens [5] - The model's architecture allows it to have the expressive power of a 30 billion parameter model while maintaining the inference efficiency of a 500 million parameter model [5] Group 2: Potential Applications - The research indicates potential applications in long context compression and the memory forgetting mechanisms of large models, simulating human memory decay by gradually reducing the size of rendered images over time [5][6] - This approach could represent a significant breakthrough in handling ultra-long contexts, balancing theoretically infinite context information [6] Group 3: Industry Reception - The release of the DeepSeek-OCR model has garnered positive attention, receiving over 1,400 stars on GitHub shortly after its launch [7] - There are mixed opinions in the market regarding DeepSeek's pace of innovation, with some suggesting that the company is focusing on internal development for future models [8]
DeepSeek又发新模型,小而美玩出新高度
Hu Xiu· 2025-10-20 12:41
Core Insights - The article discusses the challenges faced by current LLMs in processing long texts due to quadratic growth in computational complexity, which increases with longer sequences [1] - DeepSeek-OCR presents a novel approach to address this issue by utilizing "optical compression," converting text into images to reduce the number of tokens required for processing [5][34] - The model demonstrates a token compression capability of 7 to 20 times while maintaining high accuracy, showcasing its potential for efficient long-context processing [34][36] Technical Overview - DeepSeek-OCR achieves a compression rate of up to 10 times with an accuracy of over 97% [4] - The model uses a two-component architecture: DeepEncoder for image feature extraction and compression, and DeepSeek3B-MoE for reconstructing text from compressed visual tokens [16][18] - The DeepEncoder employs a clever architecture combining SAM-base and CLIP-large models, along with a convolutional compressor to significantly reduce token numbers before entering the global attention layer [10][11] Performance Metrics - OmniDocBench benchmark results indicate that a single A100-40G GPU can generate over 200,000 pages of LLM/VLM training data daily, while 20 nodes (160 A100 GPUs) can produce up to 33 million pages [7] - DeepSeek-OCR outperforms existing models, requiring only 100 visual tokens to exceed the performance of GOT-OCR2.0, which uses 256 tokens per page [15] Data Utilization - The DeepSeek team collected 30 million pages of multilingual PDF data, covering around 100 languages, with a focus on Chinese and English [21] - The data is categorized into coarse and fine annotations, with high-quality data generated through various models to enhance recognition capabilities [22] Application Potential - DeepSeek-OCR not only recognizes text but also possesses deep parsing capabilities, making it suitable for STEM applications that require structured extraction from complex images [27] - The model can extract structured data from financial reports, chemical structures, geometric figures, and generate dense captions for natural images [28] Future Directions - The team proposes exploring the concept of "optical compression" to simulate human memory decay, allowing for efficient processing of long contexts by reducing the fidelity of older information [30][31] - Future plans include conducting systematic evaluations and pre-training methods to further validate the effectiveness of this approach [35]
突破新领域,深度求索发布文字识别模型DeepSeek-OCR
Bei Ke Cai Jing· 2025-10-20 12:37
Core Insights - DeepSeek has released a new model called DeepSeek-OCR on the open-source community platform Hugging Face, which is designed for Optical Character Recognition (OCR) to extract text from images [1][3] Group 1: Model Description - DeepSeek-OCR is described in a related paper as a preliminary study on the feasibility of compressing long contexts through optical two-dimensional mapping [3] - The model achieves a decoding (OCR) accuracy of 97% when the number of text tokens is within 10 times the number of visual tokens (compression ratio < 10) [3] - Even at a compression ratio of 20, the OCR accuracy remains around 60%, indicating significant potential for research areas such as long context compression and memory forgetting mechanisms in large language models [3]
六大AI拿1万美元真实交易:DeepSeek最能赚,GPT-5亏麻了
Hu Xiu· 2025-10-20 11:49
Core Insights - Jay Chou's recent troubles involve a Bitcoin account managed by his magician friend, Cai Weize, who claimed the account was locked a year ago, resulting in a loss of funds [1][2] - The article discusses the emergence of AI models competing in the cryptocurrency market, highlighting a competition called Alpha Arena where six top AI models are trading cryptocurrencies [3][4] Group 1: AI Competition Overview - The competition involves six AI models, each given $10,000 to trade perpetual contracts on the Hyperliquid platform, with trading pairs including BTC, ETH, BNB, SOL, XRP, and DOGE [4][6] - The performance of these AI models is measured by risk-adjusted returns, focusing not only on profits but also on the risks taken [6][7] Group 2: Performance of AI Models - As of the latest update, DeepSeek Chat V3.1 leads with an account value of $14,310 and a return of +43.1%, showcasing a strategy of high leverage and concentrated positions [11][12] - Grok 4 follows with an account value of $13,921 and a return of +39.21%, employing a high-leverage long-only strategy [12][21] - Claude Sonnet 4.5 has an account value of $12,528 and a return of +25.28%, focusing on a conservative trading approach [12][23] - In contrast, GPT-5 and Gemini 2.5 Pro are underperforming, with returns of -24.78% and -27.74% respectively, indicating poor trading strategies and high transaction costs [12][30] Group 3: AI's Role in Investment - The article emphasizes that AI's greatest value in investment may lie in transparency, allowing investors to see trading records and decision-making processes, unlike human-managed accounts [40][41] - The ambition behind the AI competition is to use financial markets as a training ground for AI, aiming for continuous learning and adaptation to market dynamics [34][35]
DeepSeek团队发布新型视觉压缩模型DeepSeek-OCR
智通财经网· 2025-10-20 11:37
Core Insights - DeepSeek-AI team has launched a new research achievement called DeepSeek-OCR, which innovatively compresses long text context into visual tokens, significantly reducing the number of tokens needed for processing [1] - The system consists of two main components: DeepEncoder, designed for high-resolution input with low computational activation, and DeepSeek3B-MoE-A570M as the decoder [1] - Experimental results show that when the number of text tokens does not exceed ten times the number of visual tokens (compression ratio below 10x), the model achieves an OCR accuracy of 97%, and even at a 20x compression ratio, the accuracy remains around 60% [1] Performance Metrics - In the OmniDocBench test, DeepSeek-OCR surpassed the performance of GOT-OCR2.0, using only 100 visual tokens compared to 256 tokens per page for GOT-OCR2.0 [2] - DeepSeek-OCR also outperformed MinerU2.0, which uses an average of over 6000 tokens per page, by utilizing less than 800 visual tokens [2] - The system can generate over 200,000 pages of training data for large language models/visual language models daily on a single A100-40G GPU [2]
DeepSeek开源新模型,用视觉方式压缩一切
Guan Cha Zhe Wang· 2025-10-20 10:47
Core Insights - DeepSeek has released a new OCR model named DeepSeek-OCR, which features 3 billion parameters and aims to enhance text recognition efficiency through optical two-dimensional mapping [1][3]. Model Architecture - The DeepSeek-OCR model consists of two main components: DeepEncoder and DeepSeek3B-MoE-A570M decoder, designed for high-resolution input and efficient compression [3][7]. - DeepEncoder combines local perception capabilities with global understanding, achieving a 16x downsampling mechanism that retains 97% of key information [7]. Performance Metrics - The model achieves a decoding accuracy of 97% when the text token count is within 10 times the visual token count, and maintains approximately 60% accuracy at a compression rate of 20x [3]. - In benchmark tests, DeepSeek-OCR outperformed GOT-OCR2.0 and MinerU2.0 using significantly fewer visual tokens [4]. Practical Applications - DeepSeek-OCR can generate over 200,000 pages of LLM/VLM training data daily on a single A100-40G GPU, indicating its high operational efficiency [4][7]. - The model has potential applications in various sectors, including finance for digitizing financial reports, healthcare for archiving medical records, and publishing for digitizing ancient texts [17].
DeepSeek等“六小龙”齐聚一堂,世界互联网大会11月初举行
Xuan Gu Bao· 2025-10-20 08:22
Group 1 - The 2025 World Internet Conference will be held from November 6 to 9 in Wuzhen, Zhejiang, China, focusing on themes of open cooperation and digital future [1] - The conference will feature 24 sub-forums covering topics such as global development initiatives, digital economy, data governance, and AI empowerment [1] - A special dialogue session called "Six Little Dragons Wuzhen Dialogue" will include leaders from innovative tech companies discussing AI technology trends [1] Group 2 - Major company representatives confirmed to attend include Yang Jie from China Mobile, Wu Yongming from Alibaba, and Zhang Chaoyang from Sohu [4] - Key partners of the conference include iFlytek, Anheng Information, ZTE, and 360 [4] Group 3 - Deep Technology has provided cybersecurity support for major events including G20, APEC, and the World Internet Conference [6] - Green Alliance Technology has been involved in cybersecurity for significant events such as the G20 and the World Internet Conference [7] - China Youth Travel Service has diversified its operations in tourism and has been providing services for the World Internet Conference [7] - Qi Anxin's AISOC won the "New Light" product award at the 2024 Wuzhen World Internet Conference [7]