Workflow
OCR
icon
Search documents
智谱开源OCR!测完我把手机里的扫描软件都卸了......
量子位· 2026-02-11 12:49
Core Insights - The article discusses the capabilities and performance of the GLM-OCR model, highlighting its competitive edge in the OCR technology landscape, particularly in complex scenarios like handwriting and table recognition [1][39]. Performance Comparison - GLM-OCR outperforms several competitors in various OCR tasks, achieving a document parsing accuracy of 94.6% on OmniDocBench V1.5, surpassing PaddleOCR and others [2]. - In text recognition, GLM-OCR achieves 94.0% accuracy, significantly higher than some competitors like Deepseek-OCR2, which only reaches 34.7% [2]. - For formula recognition, GLM-OCR scores 96.5%, indicating strong performance in recognizing mathematical expressions [2]. - The model also excels in table recognition, with an accuracy of 85.2% on PubTabNet, outperforming many alternatives [2]. Practical Applications - GLM-OCR is particularly effective for structured documents such as Word, PPT, and academic papers, as well as for recognizing clear handwriting, receipts, and scanned contracts [3][4]. - The model demonstrates strong capabilities in recognizing handwritten forms, achieving an accuracy of 86.1% [4]. - It can accurately extract information from various documents, including meeting minutes and whiteboard notes, making it suitable for everyday work scenarios [3][4]. User Experience - Users report a generally positive experience with GLM-OCR in standard document parsing tasks, although challenges remain with unclear handwriting and complex layouts [4][12]. - The model's ability to handle low-quality inputs is commendable, with a recognition accuracy of around 96% for mixed content, although some errors were noted in specific cases [13][29]. Structural Extraction - GLM-OCR is capable of structured information extraction, producing outputs in standard JSON format from various documents, which is beneficial for applications like invoicing and identification [36][38]. - The model's performance in structured extraction improves significantly when clear prompts are provided, indicating its adaptability to user requirements [38]. Industry Trends - The OCR technology market is rapidly evolving, with new models like GLM-OCR emerging to meet increasing demands for efficiency and accuracy [39][40]. - The trend towards smaller model parameters (0.07B to 0.9B) is making deployment easier and more cost-effective for users [51]. - Enhanced output quality and reduced processing times are becoming standard expectations in the OCR industry, benefiting users across various sectors [51].
昂立教育:目前已形成业财务资税一体化的数据流转体系
Zheng Quan Ri Bao Wang· 2026-01-28 12:14
Core Viewpoint - The company has completed the construction of a financial shared service center in early 2020 and is continuously advancing its financial digital transformation, establishing an integrated data flow system for finance, business, and taxation [1] Group 1: Financial Transformation - The company has implemented technologies such as OCR, RPA, and AI to enhance operational efficiency and productivity [1] - The financial digital transformation is aimed at improving human efficiency and overall effectiveness [1]
DeepSeek开源全新OCR模型!弃用CLIP改用Qwen轻量小模型,性能媲美Gemini-3 Pro
量子位· 2026-01-27 08:32
henry 发自 凹非寺 量子位 | 公众号 QbitAI 刚刚,DeepSeek开源了全新的OCR模型—— DeepSeek-OCR 2 ,主打将PDF文档精准转换Markdown。 相较于去年10月20日发布的初代模型,DeepSeek-OCR 2的核心突破在于打破了传统模型死板的"光栅扫描"逻辑,实现了 根据图像语义动态 重排视觉标记(Visual Tokens) 。 为此,DeepSeek-OCR 2弃用了前作中的CLIP组件,转而使用轻量化的语言模型(Qwen2-0.5B)构建 DeepEncoder V2 ,在视觉编码阶 段就引入了"因果推理"能力。 这一调整模拟了人类阅读文档时的因果视觉流,使LLM在进行内容解读之前,智能地重排视觉标记。 性能上,DeepSeek-OCR 2在仅采用轻量模型的前提下,达到了媲美Gemini-3 Pro的效果。 在OmniDocBench v1.5基准上,DeepSeek-OCR 2提升了 3.73% ,并在视觉阅读逻辑方面取得了显著进展。 | Model | | | | V-token™ax Overall ↑ Formula OM ↑ TableTEDs ↑ ...
日照港:公司正在研究推进财务共享中心建设
Group 1 - The company is researching the establishment of a financial shared service center to enhance operational efficiency and data governance [1] - The initiative includes the introduction of technologies such as OCR (Optical Character Recognition) and RPA (Robotic Process Automation) to facilitate standardized data collection and automated workflows [1] - The goal is to improve business collaboration efficiency and the overall level of data management [1]
X @子布
子布· 2025-11-27 12:51
如何用bloom的刮刀赚200倍链接:https://t.co/Vh14rqhScu昨晚Monad链上的$chog上线,官推@ChogNFT于23:46:17发推文,推文的图片中带有CA2秒后即23:46:19总共有8个地址成功买入,其中有三个地址的盈利30-40万U,分析发现这8个地址都是用的合约0x7d1e3d2858829fd28Bd1F6eBc6F2b0945390c12B进行交易,这个合约地址正是bloom的Monad链地址。bloom的刮刀现在支持OCR,这8个地址正是用了bloom刮刀的OCR功能才能第一时间狙击成功。那么如何配置呢?1、打开bloom,链接:https://t.co/Vh14rqhScu2、选择菜单栏的【/twitter】3、点击【新增配置】4、点击【Twitter OCR】使其变绿,这项就是用来识别图片中的CA其他配置按需求修改即可 ...
New DeepSeek just did something crazy...
Matthew Berman· 2025-10-22 17:15
Deepseek OCR Key Features - Deepseek OCR is a novel approach to image recognition that compresses text by 10x while maintaining 97% accuracy [2] - The model uses a vision language model (VLM) to compress text into an image, allowing for 10 times more text in the same token budget [6][11] - The method achieves 96%+ OCR decoding precision at 9-10x text compression, 90% at 10-12x compression, and 60% at 20x compression [13] Technical Details - The model splits the input image into 16x16 patches [9] - It uses SAM, an 80 million parameter model, to look for local details [10] - It uses CLIP, a 300 million parameter model, to store information about how to put the images together [10] - The output is decoded by Deepseek 3B, a 3 billion parameter mixture of experts model with 570 million active parameters [10] Training Data - The model was trained on 30 million pages of diverse PDF data covering approximately 100 languages from the internet [21] - Chinese and English account for approximately 25 million pages, and other languages account for 5 million pages [21] Potential Impact - This technology could potentially 10x the context window of large language models [20] - Andre Carpathy suggests that pixels might be better inputs to LLMs than text tokens [17] - An entire encyclopedia could be compressed into a single high-resolution image [20]
10倍压缩率、97%解码精度!DeepSeek开源新模型 为何赢得海内外关注
Xin Lang Cai Jing· 2025-10-21 23:26
Core Insights - DeepSeek has open-sourced a new model called DeepSeek-OCR, which utilizes visual patterns for context compression, aiming to reduce computational costs associated with large models [1][3][6] Model Architecture - DeepSeek-OCR consists of two main components: DeepEncoder, a visual encoder designed for high compression and high-resolution document processing, and DeepSeek3B-MoE, a lightweight language decoder [3][4] - The DeepEncoder integrates two established visual model architectures: SAM (Segment Anything Model) for local detail processing and CLIP (Contrastive Language–Image Pre-training) for capturing global knowledge [4][6] Performance and Capabilities - The model demonstrates strong "deep parsing" abilities, capable of recognizing complex visual elements such as charts and chemical formulas, thus expanding its application in fields like finance, research, and education [6][7] - Experimental results indicate that when the number of text tokens is within ten times that of visual tokens (compression ratio <10×), the model achieves 97% OCR accuracy, maintaining around 60% accuracy even at a 20× compression ratio [6][7][8] Industry Reception - The model has received widespread acclaim from tech media and industry experts, with notable figures like Andrej Karpathy praising its innovative approach to using pixels as input for large language models [3][4] - Elon Musk commented on the long-term potential of AI models primarily utilizing photon-based inputs, indicating a shift in how data may be processed in the future [4] Practical Applications - DeepSeek-OCR is positioned as a highly practical model capable of generating large-scale pre-training data, with a single A100-40G GPU able to produce over 200,000 pages of training data daily [7][8] - The model's unique approach allows it to compress a 1000-word article into just 100 visual tokens, showcasing its efficiency in processing and recognizing text [8]
全球OCR最强模型仅0.9B!百度文心衍生模型刚刚横扫4项SOTA
量子位· 2025-10-17 09:45
Core Insights - The article highlights the launch of Baidu's new self-developed multi-modal document parsing model, PaddleOCR-VL, which achieved a score of 92.6 on the OmniDocBench V1.5 leaderboard, ranking first globally in comprehensive performance [1][11]. Model Performance - PaddleOCR-VL, with a parameter count of 0.9 billion, excels in four core capabilities: text recognition, formula recognition, table understanding, and reading order, achieving state-of-the-art (SOTA) results in all dimensions [3][12][13]. - The model supports 109 languages and maintains high recognition accuracy even in complex formats, breaking traditional OCR limitations [14][16]. Technical Specifications - The model is designed for complex document structure parsing, capable of understanding logical structures, table relationships, and mathematical expressions in documents [5][6]. - PaddleOCR-VL utilizes a two-stage architecture for document layout analysis and fine-grained recognition, enhancing stability and efficiency in handling complex layouts [36][37]. Industry Impact - PaddleOCR-VL is positioned as a critical tool in various industries, including finance, education, and public services, facilitating digital transformation and process automation [51][52]. - The model's capabilities allow it to serve as a "document work assistant," integrating seamlessly into workflows to improve efficiency and reduce costs [52][56]. Competitive Landscape - The model's performance challenges the notion that only large models can achieve high effectiveness, demonstrating that a well-structured, focused model can outperform larger counterparts in practical applications [48][49]. - PaddleOCR-VL represents a significant advancement in Baidu's multi-modal intelligence strategy, marking a milestone in the global document parsing landscape [57][58].
新股前瞻|扫描全能王不断贡献增量,被AI“香”到了的合合信息(688615.SH)赴港“再拼一把”?
智通财经网· 2025-07-10 10:51
Core Viewpoint - Company Hohhot Information is expanding its market presence by applying for a listing on the Hong Kong Stock Exchange after successfully launching on the Shanghai Stock Exchange's Sci-Tech Innovation Board in September 2022. The company's flagship product, "Scan All-in-One," has significantly contributed to its revenue growth and market position in the AI sector [1][2][11]. Financial Performance - The total revenue of Hohhot Information for the years 2022 to 2024 is projected to be RMB 988.46 million, RMB 1.186 billion, and RMB 1.438 billion respectively. In Q1 2023, the revenue reached RMB 395 million, marking a 20% increase from RMB 327 million in the same period last year [2][5]. - The revenue contribution from the "Scan All-in-One" product is expected to rise from 72.3% in 2022 to 81.1% in Q1 2024, indicating its dominance in the company's product lineup [1][3]. Product Breakdown - The revenue from C-end products, including "Scan All-in-One," "Business Card All-in-One," and "Qixinbao," accounted for 82.2%, 84.3%, and 83.8% of total revenue from 2022 to 2024. In Q1 2023, this segment's contribution increased to 86.6% [3][5]. - The "Scan All-in-One" product's revenue is projected to grow from RMB 714.57 million in 2022 to RMB 1.112 billion in 2024, while the "Business Card All-in-One" and "Qixinbao" products have shown mixed performance [3][5]. Market Position and Competitive Advantage - Hohhot Information ranks first in China and fifth globally in terms of revenue from C-end efficiency AI products with over 100 million monthly active users [1][2]. - The company has leveraged advancements in AI, particularly in optical character recognition (OCR) technology, achieving a character recognition rate of 81.9% in complex scenarios, which is significantly higher than competitors [7][8]. Future Growth Potential - The company plans to enhance its product offerings by developing new features such as RPA (Robotic Process Automation) and expanding its global market presence, particularly in emerging markets like Brazil, Indonesia, and Mexico, where the conversion rate is currently below one-fourth of that in China [10][11]. - Hohhot Information aims to establish a global technical support center and expand its marketing network to improve brand influence and customer service [10].
国网蒙东电力优化资金应用,进一步提升资金精益管理水平
Core Insights - The launch of the Smart Shared Financial Platform by State Grid Inner Mongolia Electric Power marks significant improvements in fund management efficiency, payment security, and integration of finance and operations [1] Group 1: Financial Platform Applications - The platform has optimized the entire lifecycle management of bank accounts, automating the collection of bank transaction data and reconciling nearly 130,000 transactions [1] - It has achieved comprehensive processing of fund settlement workflows, managing over 140,000 payments and nearly 40,000 electricity revenue receipts [1] - The management of bills has been enhanced, with nearly 1,000 bank acceptance bills initialized through online connectivity with relevant systems [1] - A standardized investment and financing management model has been established, processing nearly 100 financing transactions from initiation to repayment [1] Group 2: Future Developments - The company plans to leverage OCR and AI technologies for intelligent fund auditing, aiming to transition from manual experience-driven processes to data-driven solutions [2]