Workflow
Seek .(SKLTY)
icon
Search documents
刚刚,DeepSeek重要突破,大模型上下文紧箍咒打破
3 6 Ke· 2025-10-20 23:22
Core Insights - DeepSeek has introduced a novel technology path in the competition of large language models by open-sourcing the DeepSeek-OCR model, which proposes the concept of "Contextual Optical Compression" for efficient information compression through text-to-image conversion [1][8]. Group 1: Model Performance and Capabilities - The feasibility of DeepSeek-OCR has been validated, achieving a decoding accuracy of 97% at a 10x compression ratio, indicating near-lossless compression, while maintaining approximately 60% accuracy at a 20x compression ratio [3][21]. - DeepSeek-OCR can express similar textual content using fewer tokens by converting text tokens into visual tokens, providing a new approach to address the high computational costs associated with processing long texts in large language models [6][11]. - In practical applications, DeepSeek-OCR surpassed GOT-OCR 2.0 using only 100 visual tokens and outperformed MinerU 2.0 with less than 800 visual tokens, demonstrating its efficiency [6][23]. Group 2: Technical Architecture - The architecture of DeepSeek-OCR consists of two main components: DeepEncoder, a visual encoder designed for high compression and high-resolution document processing, and DeepSeek3B-MoE, a lightweight mixture of experts language decoder [12][18]. - DeepEncoder employs a dual-structure design combining local and global attention to achieve high-fidelity visual understanding, significantly reducing the number of vision tokens generated from document images [14][18]. Group 3: Data and Training - DeepSeek-OCR's training process is relatively straightforward, involving independent training of DeepEncoder and the complete DeepSeek-OCR model, utilizing a large dataset for effective learning [20][21]. - The model has been trained on a diverse dataset that includes OCR 1.0 and OCR 2.0 data, general visual data, and pure text data, ensuring robust performance across various document types [25][36]. Group 4: Application and Future Directions - DeepSeek-OCR demonstrates capabilities in deep parsing, allowing it to recognize and extract structured information from various document types, including financial reports and scientific literature [24][29]. - The research team plans to further explore the integration of digital and optical text pre-training methods and evaluate the performance of optical compression in real long-text environments, indicating a promising direction for future research [39].
建湖农商银行上线DeepSeek智能助手
Jiang Nan Shi Bao· 2025-10-20 23:15
Core Insights - Jianhu Rural Commercial Bank has successfully developed and launched the "DeepSeek Intelligent Assistant," a smart data model that covers comprehensive business knowledge and regulatory documents, enhancing the bank's digital transformation and intelligent support capabilities [1] Group 1: Intelligent Data Model Development - The "DeepSeek Intelligent Assistant" utilizes localized deployment of the DeepSeek-R1 model, along with technologies such as Ollama, Dify, and Text Embedding, ensuring data security while providing robust support across various banking functions [1] - The model aims to facilitate high-quality development towards digitalization and intelligence within the bank [1] Group 2: Credit Assistance - In the area of credit assistance, the "DeepSeek Intelligent Assistant" integrates credit policies and documentation, leveraging knowledge graphs to instantly understand and accurately respond to various credit-related inquiries [1] - This capability enhances the efficiency of customer managers by precisely locating the source of answers [1] Group 3: Compliance Management - For compliance management, the "DeepSeek Intelligent Assistant" has established a compliance knowledge base by incorporating all institutional regulations, enabling the generation of compliance question banks through the intelligent assistant [1] - This transition from manual question generation to intelligent generation promotes employee learning and adherence to compliance standards [1]
DeepSeek开源新模型!单张A100日处理可超20万页数据
Di Yi Cai Jing· 2025-10-20 13:23
Core Insights - DeepSeek has released a new OCR model named DeepSeek-OCR, which focuses on context optical compression to address the challenges faced by large language models in processing long texts [1][4][6] Group 1: Model Features - The DeepSeek-OCR model utilizes visual modalities to efficiently compress text information, achieving nearly 10 times lossless context compression while maintaining an OCR accuracy of over 97% [4][5] - The model consists of two main components: DeepEncoder for image feature extraction and compression, and DeepSeek3B-MoE for reconstructing text from compressed visual tokens [5] - The model's architecture allows it to have the expressive power of a 30 billion parameter model while maintaining the inference efficiency of a 500 million parameter model [5] Group 2: Potential Applications - The research indicates potential applications in long context compression and the memory forgetting mechanisms of large models, simulating human memory decay by gradually reducing the size of rendered images over time [5][6] - This approach could represent a significant breakthrough in handling ultra-long contexts, balancing theoretically infinite context information [6] Group 3: Industry Reception - The release of the DeepSeek-OCR model has garnered positive attention, receiving over 1,400 stars on GitHub shortly after its launch [7] - There are mixed opinions in the market regarding DeepSeek's pace of innovation, with some suggesting that the company is focusing on internal development for future models [8]
DeepSeek又发新模型,小而美玩出新高度
Hu Xiu· 2025-10-20 12:41
Core Insights - The article discusses the challenges faced by current LLMs in processing long texts due to quadratic growth in computational complexity, which increases with longer sequences [1] - DeepSeek-OCR presents a novel approach to address this issue by utilizing "optical compression," converting text into images to reduce the number of tokens required for processing [5][34] - The model demonstrates a token compression capability of 7 to 20 times while maintaining high accuracy, showcasing its potential for efficient long-context processing [34][36] Technical Overview - DeepSeek-OCR achieves a compression rate of up to 10 times with an accuracy of over 97% [4] - The model uses a two-component architecture: DeepEncoder for image feature extraction and compression, and DeepSeek3B-MoE for reconstructing text from compressed visual tokens [16][18] - The DeepEncoder employs a clever architecture combining SAM-base and CLIP-large models, along with a convolutional compressor to significantly reduce token numbers before entering the global attention layer [10][11] Performance Metrics - OmniDocBench benchmark results indicate that a single A100-40G GPU can generate over 200,000 pages of LLM/VLM training data daily, while 20 nodes (160 A100 GPUs) can produce up to 33 million pages [7] - DeepSeek-OCR outperforms existing models, requiring only 100 visual tokens to exceed the performance of GOT-OCR2.0, which uses 256 tokens per page [15] Data Utilization - The DeepSeek team collected 30 million pages of multilingual PDF data, covering around 100 languages, with a focus on Chinese and English [21] - The data is categorized into coarse and fine annotations, with high-quality data generated through various models to enhance recognition capabilities [22] Application Potential - DeepSeek-OCR not only recognizes text but also possesses deep parsing capabilities, making it suitable for STEM applications that require structured extraction from complex images [27] - The model can extract structured data from financial reports, chemical structures, geometric figures, and generate dense captions for natural images [28] Future Directions - The team proposes exploring the concept of "optical compression" to simulate human memory decay, allowing for efficient processing of long contexts by reducing the fidelity of older information [30][31] - Future plans include conducting systematic evaluations and pre-training methods to further validate the effectiveness of this approach [35]
突破新领域,深度求索发布文字识别模型DeepSeek-OCR
Bei Ke Cai Jing· 2025-10-20 12:37
Core Insights - DeepSeek has released a new model called DeepSeek-OCR on the open-source community platform Hugging Face, which is designed for Optical Character Recognition (OCR) to extract text from images [1][3] Group 1: Model Description - DeepSeek-OCR is described in a related paper as a preliminary study on the feasibility of compressing long contexts through optical two-dimensional mapping [3] - The model achieves a decoding (OCR) accuracy of 97% when the number of text tokens is within 10 times the number of visual tokens (compression ratio < 10) [3] - Even at a compression ratio of 20, the OCR accuracy remains around 60%, indicating significant potential for research areas such as long context compression and memory forgetting mechanisms in large language models [3]
六大AI拿1万美元真实交易:DeepSeek最能赚,GPT-5亏麻了
Hu Xiu· 2025-10-20 11:49
Core Insights - Jay Chou's recent troubles involve a Bitcoin account managed by his magician friend, Cai Weize, who claimed the account was locked a year ago, resulting in a loss of funds [1][2] - The article discusses the emergence of AI models competing in the cryptocurrency market, highlighting a competition called Alpha Arena where six top AI models are trading cryptocurrencies [3][4] Group 1: AI Competition Overview - The competition involves six AI models, each given $10,000 to trade perpetual contracts on the Hyperliquid platform, with trading pairs including BTC, ETH, BNB, SOL, XRP, and DOGE [4][6] - The performance of these AI models is measured by risk-adjusted returns, focusing not only on profits but also on the risks taken [6][7] Group 2: Performance of AI Models - As of the latest update, DeepSeek Chat V3.1 leads with an account value of $14,310 and a return of +43.1%, showcasing a strategy of high leverage and concentrated positions [11][12] - Grok 4 follows with an account value of $13,921 and a return of +39.21%, employing a high-leverage long-only strategy [12][21] - Claude Sonnet 4.5 has an account value of $12,528 and a return of +25.28%, focusing on a conservative trading approach [12][23] - In contrast, GPT-5 and Gemini 2.5 Pro are underperforming, with returns of -24.78% and -27.74% respectively, indicating poor trading strategies and high transaction costs [12][30] Group 3: AI's Role in Investment - The article emphasizes that AI's greatest value in investment may lie in transparency, allowing investors to see trading records and decision-making processes, unlike human-managed accounts [40][41] - The ambition behind the AI competition is to use financial markets as a training ground for AI, aiming for continuous learning and adaptation to market dynamics [34][35]
DeepSeek团队发布新型视觉压缩模型DeepSeek-OCR
智通财经网· 2025-10-20 11:37
Core Insights - DeepSeek-AI team has launched a new research achievement called DeepSeek-OCR, which innovatively compresses long text context into visual tokens, significantly reducing the number of tokens needed for processing [1] - The system consists of two main components: DeepEncoder, designed for high-resolution input with low computational activation, and DeepSeek3B-MoE-A570M as the decoder [1] - Experimental results show that when the number of text tokens does not exceed ten times the number of visual tokens (compression ratio below 10x), the model achieves an OCR accuracy of 97%, and even at a 20x compression ratio, the accuracy remains around 60% [1] Performance Metrics - In the OmniDocBench test, DeepSeek-OCR surpassed the performance of GOT-OCR2.0, using only 100 visual tokens compared to 256 tokens per page for GOT-OCR2.0 [2] - DeepSeek-OCR also outperformed MinerU2.0, which uses an average of over 6000 tokens per page, by utilizing less than 800 visual tokens [2] - The system can generate over 200,000 pages of training data for large language models/visual language models daily on a single A100-40G GPU [2]
DeepSeek开源新模型,用视觉方式压缩一切
Guan Cha Zhe Wang· 2025-10-20 10:47
Core Insights - DeepSeek has released a new OCR model named DeepSeek-OCR, which features 3 billion parameters and aims to enhance text recognition efficiency through optical two-dimensional mapping [1][3]. Model Architecture - The DeepSeek-OCR model consists of two main components: DeepEncoder and DeepSeek3B-MoE-A570M decoder, designed for high-resolution input and efficient compression [3][7]. - DeepEncoder combines local perception capabilities with global understanding, achieving a 16x downsampling mechanism that retains 97% of key information [7]. Performance Metrics - The model achieves a decoding accuracy of 97% when the text token count is within 10 times the visual token count, and maintains approximately 60% accuracy at a compression rate of 20x [3]. - In benchmark tests, DeepSeek-OCR outperformed GOT-OCR2.0 and MinerU2.0 using significantly fewer visual tokens [4]. Practical Applications - DeepSeek-OCR can generate over 200,000 pages of LLM/VLM training data daily on a single A100-40G GPU, indicating its high operational efficiency [4][7]. - The model has potential applications in various sectors, including finance for digitizing financial reports, healthcare for archiving medical records, and publishing for digitizing ancient texts [17].
DeepSeek等“六小龙”齐聚一堂,世界互联网大会11月初举行
Xuan Gu Bao· 2025-10-20 08:22
Group 1 - The 2025 World Internet Conference will be held from November 6 to 9 in Wuzhen, Zhejiang, China, focusing on themes of open cooperation and digital future [1] - The conference will feature 24 sub-forums covering topics such as global development initiatives, digital economy, data governance, and AI empowerment [1] - A special dialogue session called "Six Little Dragons Wuzhen Dialogue" will include leaders from innovative tech companies discussing AI technology trends [1] Group 2 - Major company representatives confirmed to attend include Yang Jie from China Mobile, Wu Yongming from Alibaba, and Zhang Chaoyang from Sohu [4] - Key partners of the conference include iFlytek, Anheng Information, ZTE, and 360 [4] Group 3 - Deep Technology has provided cybersecurity support for major events including G20, APEC, and the World Internet Conference [6] - Green Alliance Technology has been involved in cybersecurity for significant events such as the G20 and the World Internet Conference [7] - China Youth Travel Service has diversified its operations in tourism and has been providing services for the World Internet Conference [7] - Qi Anxin's AISOC won the "New Light" product award at the 2024 Wuzhen World Internet Conference [7]
DeepSeek、高性能碳纤维复合材料等上榜全球十大工程成就
Zhong Guo Hua Gong Bao· 2025-10-15 06:45
Core Insights - The 2025 World Engineering Organization General Assembly and Global Engineering Conference was held in Shanghai, focusing on "Engineering Shapes a Green Future" [1] - The conference announced the "Top Ten Global Engineering Achievements of 2025," selected by the Chinese Academy of Engineering's journal "Engineering" [1] Group 1: Engineering Achievements - The ten engineering achievements include antibody-drug conjugates, Blackwell GPU architecture, DeepSeek open-source large language model, full ocean depth manned submersible, high-performance carbon fiber composites, humanoid robots, Perseverance Mars rover, Euclid space telescope, South-to-North Water Diversion Project, and Taklamakan Desert edge-locking project [1] - These achievements were selected from significant engineering technological innovations completed in the past five years, demonstrating global impact and effectiveness [1] Group 2: Characteristics of Engineering Achievements - The achievements reflect advanced technological levels and significant original breakthroughs in engineering science [2] - They showcase systematic innovation through technology integration, system optimization, and resource collaboration to achieve overall goals [2] - The achievements highlight the development direction of new productive forces, with potential to stimulate new industries and economic growth [2] - They emphasize the critical role of engineering in overcoming global challenges, exemplified by high-performance carbon fiber composites, which are widely used in aerospace, new energy, and high-end equipment sectors, driving innovation across the entire industry chain [2]