Workflow
Seek .(SKLTY)
icon
Search documents
文本已死,视觉当立,Karpathy狂赞DeepSeek新模型,终结分词器时代
3 6 Ke· 2025-10-21 07:22
Core Insights - DeepSeek has made a significant breakthrough with its new model, DeepSeek-OCR, which fundamentally changes the input paradigm from text to visual data, suggesting that visual inputs may become the mainstream in AI applications [1][14][17] Performance Metrics - DeepSeek-OCR achieves approximately 2500 tokens per second on a single A100-40G card while maintaining a 97% OCR accuracy. It compresses visual context to 1/20 of its original size, with typical usage achieving a compression ratio of less than 1/10 [3][5] - The model can compress an entire page of dense text into just 100 visual tokens, achieving up to 60 times compression on the OmniDocBench benchmark [5][11] Technical Advantages - DeepSeek-OCR boasts fewer parameters, high compression rates, fast processing speeds, and support for 100 languages, making it both theoretically valuable and highly practical [7][11] - The model demonstrates that physical pages (like microfilm and books) are superior data sources for training AI models compared to low-quality internet text [11] Industry Implications - The shift from text to visual inputs could redefine how large language models process information, potentially eliminating the need for traditional tokenizers, which have been criticized for their inefficiencies [16][19] - Karpathy, a prominent figure in AI, emphasizes that the future may see all inputs for AI models being images, enhancing efficiency and information flow [15][25] Community Response - The open-source project has gained significant traction, receiving 4.4k stars on GitHub overnight, indicating strong community interest and support [10][46]
DeepSeek的新模型很疯狂:整个AI圈都在研究视觉路线,Karpathy不装了
3 6 Ke· 2025-10-21 04:12
Core Insights - The introduction of DeepSeek-OCR has the potential to revolutionize the paradigm of large language models (LLMs) by suggesting that all inputs should be treated as images rather than text, which could lead to significant improvements in efficiency and context handling [1][3][8]. Group 1: Model Performance and Efficiency - DeepSeek-OCR can compress a 1000-word article into 100 visual tokens, achieving a compression efficiency that is ten times better than traditional text tokenization while maintaining a 97% accuracy rate [1][8]. - A single NVIDIA A100 GPU can process 200,000 pages of data daily using this model, indicating its high throughput capabilities [1]. - The model's approach to using visual tokens instead of text tokens could allow for a more efficient representation of information, potentially expanding the effective context size of LLMs significantly [9][10]. Group 2: Community Reception and Validation - The open-source release of DeepSeek-OCR garnered over 4000 stars on GitHub within a single night, reflecting strong interest and validation from the AI community [1]. - Notable figures in the AI field, such as Andrej Karpathy, have praised the model, indicating its potential impact and effectiveness [1][3]. Group 3: Theoretical Implications - The model's ability to represent text as visual tokens raises questions about how this might affect the cognitive capabilities of LLMs, particularly in terms of reasoning and language expression [9][10]. - The concept aligns with human cognitive processes, where visual memory plays a significant role in recalling information, suggesting a more natural way for models to process and retrieve data [9]. Group 4: Historical Context and Comparisons - While DeepSeek-OCR presents a novel approach, it is noted that similar ideas were previously explored in the 2022 paper "Language Modelling with Pixels," which proposed a pixel-based language encoder [14][16]. - The ongoing development in this area includes various research papers that build upon the foundational ideas of visual tokenization and its applications in multi-modal learning [16]. Group 5: Criticism and Challenges - Some researchers have criticized DeepSeek-OCR for lacking progressive development compared to human cognitive processes, suggesting that the model may not fully replicate human-like understanding [19].
DeepSeek开源新模型;苹果iPhone 17销售火热
Group 1: Technology Developments - DeepSeek-AI team released an open-source model DeepSeek-OCR, which compresses long text contexts using visual modalities, capable of generating over 200,000 pages of training data daily on a single A100-40G GPU [2] - IBM partnered with AI company Groq to enhance enterprise AI deployment, allowing customers to access Groq's inference technology on watsonx Orchestrate [7] - Baidu is set to officially launch its Xiaodu AI glasses at the Baidu World 2025 conference in November, with plans for a year-end release [16] Group 2: Market Performance - Apple's iPhone 17 series saw strong early sales in China and the US, with a 14% increase in sales compared to the iPhone 16 series, leading to a historic high in Apple's stock price and a market cap of $3.89 trillion [3] - JD.com reported that over 52,000 brands experienced a sales increase of over 300% during the preliminary "Double 11" shopping event, with significant growth in consumer electronics and AI-related products [4] - Douyin e-commerce revealed that over 41,000 merchants achieved a 500% increase in live-stream sales during the first phase of "Double 11," with a 900% increase in merchants generating over 100 million yuan in sales [5] Group 3: Financial Results - CATL reported a net profit of 18.55 billion yuan for Q3 2025, marking a 41.21% year-on-year increase [11] - China Mobile announced a net profit of 115.4 billion yuan for the first three quarters of 2025, reflecting a 4% year-on-year growth, with a total operating revenue of 794.7 billion yuan [12] - iFlytek reported a net profit of 172 million yuan for Q3 2025, a significant increase of 202.40% year-on-year, with total revenue of 6.078 billion yuan [13] Group 4: Industry Trends - The production of industrial and service robots in China has seen significant growth, with a 9.7% increase in the value added of the digital product manufacturing industry in the first three quarters of the year [8] - Samsung is accelerating the development of HBM4, set to be unveiled at the Samsung Tech Day from October 27 to 31, while Micron's Chief Business Officer predicts a continued tight DRAM market through 2026 [9][10] - Ant Group's future Hainan information technology company increased its registered capital to 3.5 billion yuan, a staggering 34,900% increase [14]
突破新领域 深度求索发布文字识别模型DeepSeek-OCR
Xin Jing Bao· 2025-10-21 03:11
DeepSeek还上传了与该模型相关的论文,在论文中,DeepSeek-OCR被描述为是"一项关于通过光学二维 映射来压缩长上下文可行性的初步研究。"实验表明,当文本标记数量在视觉标记数量的10倍以内时 (即压缩比 < 10倍),该模型可以达到97%的解码(OCR)精度。即使在20倍的压缩比下,OCR 准确 率仍能保持在约60%的水平。这对于长上下文压缩、大语言模型中的记忆遗忘机制等研究领域展现了相 当大的潜力。 (文章来源:新京报) 新京报贝壳财经讯(记者罗亦丹)北京时间10月20日,DeepSeek(深度求索)在开源社区Hugging Face 上发布了新模型DeepSeek-OCR。据了解,OCR(Optical Character Recognition,文字识别)模型是一种 用来从图像中提取文本的技术。 ...
智能早报丨美国一实验室测试AI炒币,DeepSeek暂列榜首;荷兰寻求与中方化解安世僵局
Guan Cha Zhe Wang· 2025-10-21 02:14
Group 1: AI Trading Competition - A trading competition was held by nof1.ai's "Alpha Arena" platform, featuring six AI models each trading with $10,000 in real money, not simulated [1] - The AI models participating include GPT-5, Claude Sonnet 4.5, DeepSeek Chat V3.1, Gemini 2.5 Pro, Grok 4, and Qwen3 Max, with DeepSeek currently leading the competition [1][3] - DeepSeek has achieved a total portfolio value close to $14,000 with a return rate of approximately 40%, making it the best-performing model so far [3] Group 2: Semiconductor Industry - The Netherlands is seeking to resolve the deadlock surrounding ASML Semiconductor, which has implications for China-Netherlands trade relations and the global automotive chip supply chain [4] - Dutch Economic Affairs Minister Vincent Karremans indicated that the Netherlands aims to prevent the transfer of business and intellectual property from ASML, which is crucial for Chinese automakers [5] - The deadlock originated from the U.S. "penetration rules" announced on September 29, leading to direct Dutch government intervention in ASML's internal affairs, affecting its global operations [5] Group 3: AI Developments - Alibaba's Quark is secretly advancing a project known as "C Plan," focusing on conversational AI applications, with expectations for a new product launch soon [6] - The "C Plan" has been in development for a significant time and is anticipated to require long-term investment and technological breakthroughs [6] Group 4: Financial Performance - iFlytek reported a net profit of 172 million yuan for Q3, marking a year-on-year increase of 202.4%, with total revenue of 6.078 billion yuan, up 10.02% year-on-year [9] Group 5: Technology Innovations - Science Corporation, founded by the former president of Neuralink, has developed a retinal microchip implant that restores vision for blind patients, allowing them to read and engage in activities like crossword puzzles [10]
智能早报丨DeepSeek暂列AI炒币之王;荷兰寻求与中方化解安世僵局
Guan Cha Zhe Wang· 2025-10-21 02:02
Group 1: AI Trading Competition - A trading competition was launched by the AI research lab nof1.ai, featuring six top AI models each trading with $10,000 in real cryptocurrency, ensuring a fair environment with identical prompts and market conditions [1] - The six AI models participating are GPT-5, Claude Sonnet 4.5, DeepSeek Chat V3.1, Gemini 2.5 Pro, Grok 4, and Qwen3 Max, showcasing their trading capabilities [1] - As of the latest update, DeepSeek is leading the competition with a total portfolio value close to $14,000 and a return of approximately 40%, having peaked at nearly $15,000 [4] Group 2: Semiconductor Industry - The Netherlands is seeking to resolve the deadlock surrounding ASML Semiconductor, which has implications for China-Netherlands economic relations and the global automotive chip supply chain [5] - The Dutch Minister of Economic Affairs indicated that the intervention in ASML's affairs was aimed at preventing the transfer of business and intellectual property out of Europe, emphasizing the need for cooperation with China [5] - The deadlock originated from the U.S. "penetration rules" announced on September 29, leading to direct Dutch government intervention in ASML's internal matters [5] Group 3: AI Developments by Alibaba - Alibaba's Quark is secretly advancing a project known as "C Plan," which focuses on conversational AI applications, with expectations for a new product launch soon [6] - The project has been in development for a significant period and is anticipated to require long-term investment and technological breakthroughs [6] - Speculations about the name "C Plan" suggest it may relate to "Chat" or be a competitive move against ByteDance's Doubao [6] Group 4: Financial Performance of Companies - iFlytek reported a net profit of 172 million yuan for the third quarter, marking a year-on-year increase of 202.4%, with total revenue of 6.078 billion yuan, up 10.02% [9] - Apple achieved a historic market capitalization of $3.89 trillion, making it the second-largest company in the U.S. by market value, following Nvidia [8] Group 5: Innovations in Medical Technology - Science Corporation, founded by the former president of Neuralink, has developed a retinal microchip implant that restores vision for blind patients, allowing them to read and engage in activities like crossword puzzles [10] - The implant works by bypassing damaged photoreceptors in the retina, utilizing signals from a camera mounted on glasses to stimulate the retina [10]
赚钱,DeepSeek果然第一,全球六大顶级AI实盘厮杀,人手1万刀开局
3 6 Ke· 2025-10-21 01:35
Core Insights - The Alpha Arena experiment initiated by nof1.ai pits six leading AI models against each other in a real trading environment, with DeepSeek V3.1 currently leading in profitability [1][2][46]. - Each model started with an initial capital of $10,000 and received identical market data and trading instructions [4][46]. Performance Summary - **Top Performers**: - DeepSeek V3.1 achieved a profit of $3,677, representing a return of +36.77% [6]. - Grok 4 followed with a profit of $3,168, or +31.68% [6]. - **Underperformers**: - Gemini 2.5 Pro recorded the largest loss of $3,213, with a return of -32.13% [6]. - GPT-5 also performed poorly, with a loss of $2,509, equating to -25.09% [6]. Trading Dynamics - The trading strategies employed by the models varied, with DeepSeek and Grok showing similar trends, initially incurring losses before recovering and achieving significant gains [32][40]. - Gemini 2.5 Pro and GPT-5 exhibited contrasting behavior, initially gaining before experiencing substantial declines [36][37]. Market Environment - The experiment highlights the rapid changes in financial markets, emphasizing the need for AI models to adapt quickly to market fluctuations [7][45]. - The Alpha Arena serves as a new benchmark for AI performance, moving beyond traditional static assessments to a dynamic trading environment [44][48]. Model Strategies - Each model's strategy involved real-time decision-making based on market indicators and account status, with varying degrees of success [9][48]. - The models' performance is assessed not only on profitability but also on their ability to navigate uncertainty and market volatility [48].
重磅,DeepSeek再开源:视觉即压缩,100个token干翻7000个
3 6 Ke· 2025-10-21 01:35
一图胜千言!DeepSeek-OCR模型大胆探索视觉-文本压缩边界。通过少量视觉token解码出10倍以上的文本信息,这款端到端VLM架构不仅在 OmniDocBench基准上碾压GOT-OCR2.0,还为LLM的长上下文问题提供高效解决方案。 DeepSeek再发新模型! Github上,DeepSeek新建了DeepSeek-OCR仓库,目的是探索视觉-文本压缩的边界。 常言道:一图胜万言。对LLM也是如此! 在理论上,DeepSeek-OCR模型初步验证了「上下文光学压缩」的可行性—— 从少量视觉token中,模型能够有效解码出超过其数量10倍的文本token。 也就是说,包含文档文本的单张图像,能以远少于等效文本的token量来表征丰富信息。 这表明通过视觉token进行光学压缩可以实现更高的压缩比。 作为连接视觉与语言的中间模态,OCR任务是视觉-文本压缩范式理想的试验场—— 它在视觉与文本表征之间建立了天然的压缩-解压缩映射关系,同时提供可量化的评估指标。 在OCR任务上,DeepSeek-OCR有较高实用价值:在OmniDocBench基准测试中,仅用100个视觉token即超越GOT-OCR2 ...
DeepSeek新模型被硅谷夸疯了!用二维视觉压缩一维文字,单GPU能跑,“谷歌核心机密被开源”
Hua Er Jie Jian Wen· 2025-10-21 00:27
Core Insights - DeepSeek has released an open-source model named DeepSeek-OCR, which is gaining significant attention in Silicon Valley for its innovative approach to processing long texts using visual compression techniques [1][4][21] - The model is designed to tackle the computational challenges associated with large models handling lengthy text, achieving high accuracy rates even with reduced token usage [1][4][5] Model Performance - DeepSeek-OCR operates with a model size of 3 billion parameters and demonstrates a remarkable ability to decode text with high accuracy, achieving 97% accuracy with a compression ratio of less than 10 times and maintaining 60% accuracy even at a 20 times compression ratio [1][4][5] - The model has been benchmarked against existing models, showing superior performance with significantly fewer visual tokens, such as using only 100 visual tokens to outperform models that require 256 tokens [7][8] Data Generation Efficiency - The model can generate over 200,000 pages of high-quality training data daily using a single A100-40G GPU, showcasing its efficiency in data generation [2][4] Innovative Approach - DeepSeek introduces a concept called "Contextual Optical Compression," which compresses textual information into visual formats, allowing the model to interpret content through images rather than text [4][10] - The architecture includes two main components: the DeepEncoder for converting images into compressed visual tokens and the DeepSeek3B-MoE-A570M for reconstructing text from these tokens [10][11] Flexibility and Adaptability - The DeepEncoder is designed to handle various input resolutions and token counts, allowing it to adapt to different compression needs and application scenarios [11][12] - The model supports complex image analyses, including financial reports and scientific diagrams, enhancing its applicability across diverse fields [12][14] Future Implications - The research suggests that this unified approach to visual and textual processing could be a step towards achieving Artificial General Intelligence (AGI) [4][21] - The team behind DeepSeek-OCR is exploring the potential of simulating human memory mechanisms through optical compression, which could lead to more efficient handling of long-term contexts in AI [20][21]
10月21日早餐 | 苹果创历史新高;DeepSeek发布新论文
Xuan Gu Bao· 2025-10-21 00:04
Group 1: Market Overview - Trade tensions have eased, leading to a rise in US stock markets, with the S&P 500 up 1.07%, Dow Jones up 1.12%, and Nasdaq up 1.37% [1] - Apple shares increased nearly 4%, reaching a historical high for the first time this year, while Nvidia fell 0.3% [1] - Key mineral stocks surged following a cooperation agreement between the US and Australia, with NioCorp Developments rising nearly 20% and USA Rare Earth up nearly 14% [1] Group 2: Chinese Market Performance - The Chinese concept stocks index rose over 2%, with Alibaba up nearly 4% and Kuke Music surging 49% [2] Group 3: Commodity and Bond Market - Gold prices rebounded, with futures nearing $4,400, up over 4%, while silver also saw a rise of over 3% [3] - Crude oil prices fell but did not break away from five-month lows, with US oil dropping over 2% before recovering most losses [4] - The yield on ten-year US Treasury bonds decreased, approaching a six-month low, while the US dollar index rose for two consecutive days [4] Group 4: Technology and Innovation - Amazon's AWS services experienced a large-scale outage affecting thousands of users, including Coinbase and Robinhood [5] - DeepSeek introduced a new OCR model that significantly reduces computational and storage costs while maintaining high accuracy [10] Group 5: Industry Developments - The Ministry of Industry and Information Technology held a meeting on the cement industry, emphasizing the need to balance supply and demand and to eliminate outdated capacity [11] - Hubei Province launched a carbon trading platform, achieving a cumulative transaction volume exceeding 10 billion yuan, enhancing efficiency in resource allocation [11] Group 6: Corporate Earnings - Ningde Times reported a net profit of 18.55 billion yuan for Q3, a year-on-year increase of 41.21%, and a total net profit of 49.03 billion yuan for the first three quarters, up 36.20% [16] - Yonghe shares reported a Q3 net profit of 198 million yuan, up 485.77%, and a total of 470 million yuan for the first three quarters, up 220.39% [16] - China Shipbuilding expects a net profit between 5.55 billion and 6.15 billion yuan for the first three quarters, a year-on-year increase of 104.30% to 126.39% [16]