Workflow
DeepSeek
icon
Search documents
X @Bloomberg
Bloomberg· 2025-10-21 11:11
RT Bloomberg Live (@BloombergLive)"I think good technology development requires constraints...DeepSeek was a wakeup moment." @cohere's @aidangomez #BloombergTech @Lynnmdoan⏯️ https://t.co/nnRX4STmti https://t.co/IocGU5kwgF ...
DeepSeek新模型被硅谷夸疯了!
华尔街见闻· 2025-10-21 10:13
Core Viewpoint - DeepSeek has introduced a groundbreaking model called DeepSeek-OCR, which utilizes a novel approach of "contextual optical compression" to efficiently process long texts by compressing textual information into visual tokens, significantly reducing computational costs while maintaining high accuracy in document parsing [5][13][14]. Summary by Sections Model Overview - DeepSeek-OCR is designed to tackle the computational challenges associated with processing long texts, achieving a high accuracy of 97% when the compression ratio is below 10 times, and maintaining around 60% accuracy even at a 20 times compression ratio [6][15]. - The model has gained significant attention, quickly accumulating 3.3K stars on GitHub and ranking second on HuggingFace's hot list [7]. Technical Innovations - The model comprises two core components: the DeepEncoder, which converts images into highly compressed visual tokens, and the DeepSeek3B-MoE-A570M decoder, which reconstructs text from these tokens [19][20]. - The DeepEncoder employs a serial design that processes high-resolution images in three stages: local feature extraction, token compression, and global understanding, allowing it to produce a minimal number of high-density visual tokens [21][22]. Performance Metrics - DeepSeek-OCR outperforms existing models by using only 100 visual tokens to exceed the performance of GOT-OCR2.0, which uses 256 tokens per page [18][19]. - The model supports various input modes, allowing it to adapt its compression strength based on specific tasks, ranging from "Tiny" (64 tokens) to "Gundam" (up to 800 tokens) [23][25]. Future Implications - The research suggests that the unified approach of visual and textual processing may be a pathway toward achieving Artificial General Intelligence (AGI) [11]. - The team has also proposed a concept of simulating human memory's forgetting mechanism through optical compression, potentially enabling models to allocate computational resources dynamically based on the context's temporal relevance [34][37][38].
科技核心资产月报:回调蓄势不改科技趋势机会-20251021
Group 1: AI Industry Chain - The AI industry chain has experienced a short-term adjustment, but the medium-term outlook remains positive, driven by significant model updates from major players like OpenAI and DeepSeek, which are expected to catalyze new applications and edge opportunities [9][10][15] - OpenAI's recent DevDay introduced tools such as Apps SDK and AgentKit, which enhance the integration of third-party services and lower the technical barriers for developing AI agents, indicating a shift towards a more comprehensive application platform [12][11] - The demand for high-bandwidth memory (HBM) is increasing due to the rapid development and application of AI technologies, leading to a notable price increase in storage chips, with DRAM and NAND prices rising by 227.6% and 42.7% respectively since the beginning of 2025 [19][13] Group 2: High-end Manufacturing - The high-end manufacturing sector is poised for a new wave of opportunities, particularly in the robotics segment, with significant catalysts expected from Tesla's upcoming Q3 earnings call and shareholder meeting, which may provide insights into the progress of their humanoid robot, Optimus [33][34] - The robotics industry is seeing increased investment and collaboration, such as the $1 billion strategic partnership between UBTECH and Infini Capital, aimed at expanding the humanoid robot ecosystem [31][32] - The military industry has seen a pause in its upward trend, but upcoming disclosures related to the "14th Five-Year Plan" and quarterly reports are expected to provide better investment opportunities [22][27]
DeepSeek OCR:醉翁之意不在酒
Founder Park· 2025-10-21 07:46
Core Viewpoint - DeepSeek-OCR is a new AI model that processes text in images by treating text as visual data, achieving a compression of 10 times while maintaining a recognition accuracy of 96.5% [7][11]. Group 1: Model Performance and Innovation - DeepSeek-OCR can compress a 1000-word article into just 100 visual tokens, showcasing its efficiency [7]. - The model offers multiple resolution options, requiring as few as 64 tokens for a 512 x 512 image and 256 tokens for a 1024 x 1024 image [13]. - The approach of using visual tokens for text recognition is not entirely novel but represents a significant step in productization and application [13][14]. Group 2: Industry Reactions and Future Directions - Notable figures in the AI community, such as Karpathy, have expressed interest in the model, suggesting that future large language models (LLMs) might benefit from image-based inputs instead of traditional text [11][15]. - The potential for DeepSeek-OCR to enhance the processing of mixed media (text, images, tables) in various applications is highlighted, as current visual models struggle with such tasks [15]. - The idea of simulating a forgetting mechanism through resolution adjustments is intriguing but raises questions about its applicability in digital systems compared to human cognition [15].
文本已死,视觉当立,Karpathy狂赞DeepSeek新模型,终结分词器时代
3 6 Ke· 2025-10-21 07:22
Core Insights - DeepSeek has made a significant breakthrough with its new model, DeepSeek-OCR, which fundamentally changes the input paradigm from text to visual data, suggesting that visual inputs may become the mainstream in AI applications [1][14][17] Performance Metrics - DeepSeek-OCR achieves approximately 2500 tokens per second on a single A100-40G card while maintaining a 97% OCR accuracy. It compresses visual context to 1/20 of its original size, with typical usage achieving a compression ratio of less than 1/10 [3][5] - The model can compress an entire page of dense text into just 100 visual tokens, achieving up to 60 times compression on the OmniDocBench benchmark [5][11] Technical Advantages - DeepSeek-OCR boasts fewer parameters, high compression rates, fast processing speeds, and support for 100 languages, making it both theoretically valuable and highly practical [7][11] - The model demonstrates that physical pages (like microfilm and books) are superior data sources for training AI models compared to low-quality internet text [11] Industry Implications - The shift from text to visual inputs could redefine how large language models process information, potentially eliminating the need for traditional tokenizers, which have been criticized for their inefficiencies [16][19] - Karpathy, a prominent figure in AI, emphasizes that the future may see all inputs for AI models being images, enhancing efficiency and information flow [15][25] Community Response - The open-source project has gained significant traction, receiving 4.4k stars on GitHub overnight, indicating strong community interest and support [10][46]
Karpathy盛赞DeepSeek-OCR“淘汰”tokenizer!实测如何用Claude Code 让新模型跑在N卡上
AI前线· 2025-10-21 04:54
Core Insights - DeepSeek has released a new model, DeepSeek-OCR, which is a 6.6GB model specifically fine-tuned for OCR, achieving a 10× near-lossless compression and a 20× compression while retaining 60% accuracy [2] - The model introduces DeepEncoder to address the trade-offs between high resolution, low memory, and fewer tokens, achieving state-of-the-art performance in practical scenarios with minimal token consumption [2][4] - The model's architecture is lightweight, consisting of only 12 layers, which is suitable for the pattern recognition nature of OCR tasks [5] Model Innovations - DeepSeek-OCR allows for rendering original content as images before input, leading to more efficient information compression and richer information flow [6] - The model eliminates the need for tokenizers, which have been criticized for their inefficiencies and historical baggage, thus enabling a more seamless end-to-end process [6] - It employs a "Mixture of Experts" paradigm, activating only 500 million parameters during inference, allowing for efficient processing of large datasets [7] Market Position and Future Implications - Alexander Doria, co-founder of Pleiasfr, views DeepSeek-OCR as a milestone achievement, suggesting it sets a foundation for future OCR systems [4][8] - The model's training pipeline includes a significant amount of synthetic and simulated data, indicating that while it has established a balance between inference efficiency and model performance, further customization for specific domains is necessary for large-scale real-world applications [8] Developer Engagement - The release has attracted many developers, with Simon Willison successfully running the model on NVIDIA Spark in about 40 minutes, showcasing the model's accessibility and ease of use [9][21] - Willison emphasized the importance of providing a clear environment and task definition for successful implementation, highlighting the model's practical utility [24]
DeepSeek的新模型很疯狂:整个AI圈都在研究视觉路线,Karpathy不装了
3 6 Ke· 2025-10-21 04:12
Core Insights - The introduction of DeepSeek-OCR has the potential to revolutionize the paradigm of large language models (LLMs) by suggesting that all inputs should be treated as images rather than text, which could lead to significant improvements in efficiency and context handling [1][3][8]. Group 1: Model Performance and Efficiency - DeepSeek-OCR can compress a 1000-word article into 100 visual tokens, achieving a compression efficiency that is ten times better than traditional text tokenization while maintaining a 97% accuracy rate [1][8]. - A single NVIDIA A100 GPU can process 200,000 pages of data daily using this model, indicating its high throughput capabilities [1]. - The model's approach to using visual tokens instead of text tokens could allow for a more efficient representation of information, potentially expanding the effective context size of LLMs significantly [9][10]. Group 2: Community Reception and Validation - The open-source release of DeepSeek-OCR garnered over 4000 stars on GitHub within a single night, reflecting strong interest and validation from the AI community [1]. - Notable figures in the AI field, such as Andrej Karpathy, have praised the model, indicating its potential impact and effectiveness [1][3]. Group 3: Theoretical Implications - The model's ability to represent text as visual tokens raises questions about how this might affect the cognitive capabilities of LLMs, particularly in terms of reasoning and language expression [9][10]. - The concept aligns with human cognitive processes, where visual memory plays a significant role in recalling information, suggesting a more natural way for models to process and retrieve data [9]. Group 4: Historical Context and Comparisons - While DeepSeek-OCR presents a novel approach, it is noted that similar ideas were previously explored in the 2022 paper "Language Modelling with Pixels," which proposed a pixel-based language encoder [14][16]. - The ongoing development in this area includes various research papers that build upon the foundational ideas of visual tokenization and its applications in multi-modal learning [16]. Group 5: Criticism and Challenges - Some researchers have criticized DeepSeek-OCR for lacking progressive development compared to human cognitive processes, suggesting that the model may not fully replicate human-like understanding [19].
MAU被豆包反超,Deepseek 挤了点牙膏
3 6 Ke· 2025-10-21 04:12
Core Insights - DeepSeek has launched DeepSeek-OCR, an open-source model with approximately 3 billion parameters, which enhances scanning efficiency through a "visual-text compression" approach [1][2][3] - DeepSeek has recently been surpassed by its competitor Doubao in terms of monthly active users (MAU), with Doubao reaching approximately 157 million MAU, a 6.6% increase, compared to DeepSeek's 143 million [1][9] - The competition between DeepSeek and Doubao highlights a shift in the C-end AI market, with Doubao leveraging its multi-modal capabilities and integration with the Douyin ecosystem [1][2][9] DeepSeek-OCR Model - DeepSeek-OCR utilizes a "visual-text compression" method, achieving superior performance with fewer visual markers compared to traditional OCR systems [3][4] - The model can decode with 97% accuracy at 10x compression and maintain 60% accuracy at 20x compression, significantly reducing computational costs [7][18] - DeepSeek-OCR includes a "deep parsing mode" that converts financial charts into structured data, facilitating the generation of editable analysis formats [6][18] Competitive Landscape - Doubao's success is attributed to its broad audience targeting and integration with ByteDance's social platforms, making it more accessible to general users compared to DeepSeek's more technical approach [9][10][12] - The branding and user experience of Doubao are designed to appeal to a wider audience, contrasting with DeepSeek's more niche positioning [10][12] - Despite being overtaken, DeepSeek maintains a significant user base and continues to focus on technical advancements, with its V3 series boasting a total parameter count of 671 billion [17][19] Future Considerations - DeepSeek's ability to leverage its large C-end user base and differentiate its ecosystem will be crucial for competing with Doubao [19] - The release of DeepSeek-OCR may serve as a catalyst for model training and enhance the efficiency of data processing for future model iterations [18][19] - The ongoing development of the R2 model has faced delays, impacting DeepSeek's competitive edge in the rapidly evolving AI landscape [8][15][19]
DeepSeek的新模型很疯狂:整个AI圈都在研究视觉路线,Karpathy不装了
机器之心· 2025-10-21 03:43
机器之心报道 编辑:泽南、Panda 「我很喜欢新的 DeepSeek-OCR 论文…… 也许更合理的是,LLM 的所有输入都应该是图像。即使碰巧有纯文本输入,你更应该先渲染它, 然后再输入。」 一夜之间,大模型的范式仿佛被 DeepSeek 新推出的模型给打破了。 昨天下午, 全新模型 DeepSeek-OCR 突然开源 。在该模型的处理过程中,1000 个字的文章能被压缩成 100 个视觉 token,十倍的压缩下精度也可以达到 97%,一 块英伟达 A100 每天就可以处理 20 万页的数据。 这种方式或许可以解决大模型领域目前头疼的长上下文效率问题,更重要的是,如果「看」文本而不是「读」文本最终被确定为正确的方向,也意味着大模型的 范式会发生重要的转变。 GitHub 上, DeepSeek-OCR 项目一晚收获了超过 4000 个 Star 。 因为是开源的小模型,DeepSeek-OCR 第一时间经历了整个 AI 社区的检验,很多大佬在看完论文之后纷纷发表了看法,兴奋之情溢于言表。 OpenAI 联合创始成员之一,前特斯拉自动驾驶总监 Andrej Karpathy 表示,它是一个很好的 OCR ...
X @外汇交易员
外汇交易员· 2025-10-21 01:45
DeepSeek最新开源的模型DeepSeek-OCR受到海外开发者关注。该模型提出了“上下文光学压缩”的思路,通过用少量的视觉token来表示原本需要大量文本token的内容,以此降低大模型的计算开销。编码器DeepEncoder将图片转为高压视觉token,解码器DeepSeek3B-MoE-A570M则从视觉token重建文字,实现以小博大。甚至有开发者惊呼,DeepSeek-OCR的推出是“AI的JPEG时刻”。 ...