Workflow
OCR
icon
Search documents
昂立教育:目前已形成业财务资税一体化的数据流转体系
Zheng Quan Ri Bao Wang· 2026-01-28 12:14
证券日报网讯1月28日,昂立教育(600661)在互动平台回答投资者提问时表示,公司于2020年初步完 成财务共享中心建设,并以此为基点持续推进财务数智化转型,目前已形成业财务资税一体化的数据流 转体系。近年来,公司通过持续加强对OCR、RPA和AI等技术的应用,逐步释放人效、提升效能。 ...
DeepSeek开源全新OCR模型!弃用CLIP改用Qwen轻量小模型,性能媲美Gemini-3 Pro
量子位· 2026-01-27 08:32
henry 发自 凹非寺 量子位 | 公众号 QbitAI 刚刚,DeepSeek开源了全新的OCR模型—— DeepSeek-OCR 2 ,主打将PDF文档精准转换Markdown。 相较于去年10月20日发布的初代模型,DeepSeek-OCR 2的核心突破在于打破了传统模型死板的"光栅扫描"逻辑,实现了 根据图像语义动态 重排视觉标记(Visual Tokens) 。 为此,DeepSeek-OCR 2弃用了前作中的CLIP组件,转而使用轻量化的语言模型(Qwen2-0.5B)构建 DeepEncoder V2 ,在视觉编码阶 段就引入了"因果推理"能力。 这一调整模拟了人类阅读文档时的因果视觉流,使LLM在进行内容解读之前,智能地重排视觉标记。 性能上,DeepSeek-OCR 2在仅采用轻量模型的前提下,达到了媲美Gemini-3 Pro的效果。 在OmniDocBench v1.5基准上,DeepSeek-OCR 2提升了 3.73% ,并在视觉阅读逻辑方面取得了显著进展。 | Model | | | | V-token™ax Overall ↑ Formula OM ↑ TableTEDs ↑ ...
日照港:公司正在研究推进财务共享中心建设
(编辑 楚丽君) 证券日报网讯 1月21日,日照港在互动平台回答投资者提问时表示,公司正在研究推进财务共享中心建 设,拟通过引入OCR、RPA等技术工具,有效推动数据标准化采集与自动化流转,提升业务协同效率与 数据治理水平。 ...
X @子布
子布· 2025-11-27 12:51
如何用bloom的刮刀赚200倍链接:https://t.co/Vh14rqhScu昨晚Monad链上的$chog上线,官推@ChogNFT于23:46:17发推文,推文的图片中带有CA2秒后即23:46:19总共有8个地址成功买入,其中有三个地址的盈利30-40万U,分析发现这8个地址都是用的合约0x7d1e3d2858829fd28Bd1F6eBc6F2b0945390c12B进行交易,这个合约地址正是bloom的Monad链地址。bloom的刮刀现在支持OCR,这8个地址正是用了bloom刮刀的OCR功能才能第一时间狙击成功。那么如何配置呢?1、打开bloom,链接:https://t.co/Vh14rqhScu2、选择菜单栏的【/twitter】3、点击【新增配置】4、点击【Twitter OCR】使其变绿,这项就是用来识别图片中的CA其他配置按需求修改即可 ...
New DeepSeek just did something crazy...
Matthew Berman· 2025-10-22 17:15
Deepseek OCR Key Features - Deepseek OCR is a novel approach to image recognition that compresses text by 10x while maintaining 97% accuracy [2] - The model uses a vision language model (VLM) to compress text into an image, allowing for 10 times more text in the same token budget [6][11] - The method achieves 96%+ OCR decoding precision at 9-10x text compression, 90% at 10-12x compression, and 60% at 20x compression [13] Technical Details - The model splits the input image into 16x16 patches [9] - It uses SAM, an 80 million parameter model, to look for local details [10] - It uses CLIP, a 300 million parameter model, to store information about how to put the images together [10] - The output is decoded by Deepseek 3B, a 3 billion parameter mixture of experts model with 570 million active parameters [10] Training Data - The model was trained on 30 million pages of diverse PDF data covering approximately 100 languages from the internet [21] - Chinese and English account for approximately 25 million pages, and other languages account for 5 million pages [21] Potential Impact - This technology could potentially 10x the context window of large language models [20] - Andre Carpathy suggests that pixels might be better inputs to LLMs than text tokens [17] - An entire encyclopedia could be compressed into a single high-resolution image [20]
10倍压缩率、97%解码精度!DeepSeek开源新模型 为何赢得海内外关注
Xin Lang Cai Jing· 2025-10-21 23:26
Core Insights - DeepSeek has open-sourced a new model called DeepSeek-OCR, which utilizes visual patterns for context compression, aiming to reduce computational costs associated with large models [1][3][6] Model Architecture - DeepSeek-OCR consists of two main components: DeepEncoder, a visual encoder designed for high compression and high-resolution document processing, and DeepSeek3B-MoE, a lightweight language decoder [3][4] - The DeepEncoder integrates two established visual model architectures: SAM (Segment Anything Model) for local detail processing and CLIP (Contrastive Language–Image Pre-training) for capturing global knowledge [4][6] Performance and Capabilities - The model demonstrates strong "deep parsing" abilities, capable of recognizing complex visual elements such as charts and chemical formulas, thus expanding its application in fields like finance, research, and education [6][7] - Experimental results indicate that when the number of text tokens is within ten times that of visual tokens (compression ratio <10×), the model achieves 97% OCR accuracy, maintaining around 60% accuracy even at a 20× compression ratio [6][7][8] Industry Reception - The model has received widespread acclaim from tech media and industry experts, with notable figures like Andrej Karpathy praising its innovative approach to using pixels as input for large language models [3][4] - Elon Musk commented on the long-term potential of AI models primarily utilizing photon-based inputs, indicating a shift in how data may be processed in the future [4] Practical Applications - DeepSeek-OCR is positioned as a highly practical model capable of generating large-scale pre-training data, with a single A100-40G GPU able to produce over 200,000 pages of training data daily [7][8] - The model's unique approach allows it to compress a 1000-word article into just 100 visual tokens, showcasing its efficiency in processing and recognizing text [8]
全球OCR最强模型仅0.9B!百度文心衍生模型刚刚横扫4项SOTA
量子位· 2025-10-17 09:45
Core Insights - The article highlights the launch of Baidu's new self-developed multi-modal document parsing model, PaddleOCR-VL, which achieved a score of 92.6 on the OmniDocBench V1.5 leaderboard, ranking first globally in comprehensive performance [1][11]. Model Performance - PaddleOCR-VL, with a parameter count of 0.9 billion, excels in four core capabilities: text recognition, formula recognition, table understanding, and reading order, achieving state-of-the-art (SOTA) results in all dimensions [3][12][13]. - The model supports 109 languages and maintains high recognition accuracy even in complex formats, breaking traditional OCR limitations [14][16]. Technical Specifications - The model is designed for complex document structure parsing, capable of understanding logical structures, table relationships, and mathematical expressions in documents [5][6]. - PaddleOCR-VL utilizes a two-stage architecture for document layout analysis and fine-grained recognition, enhancing stability and efficiency in handling complex layouts [36][37]. Industry Impact - PaddleOCR-VL is positioned as a critical tool in various industries, including finance, education, and public services, facilitating digital transformation and process automation [51][52]. - The model's capabilities allow it to serve as a "document work assistant," integrating seamlessly into workflows to improve efficiency and reduce costs [52][56]. Competitive Landscape - The model's performance challenges the notion that only large models can achieve high effectiveness, demonstrating that a well-structured, focused model can outperform larger counterparts in practical applications [48][49]. - PaddleOCR-VL represents a significant advancement in Baidu's multi-modal intelligence strategy, marking a milestone in the global document parsing landscape [57][58].
新股前瞻|扫描全能王不断贡献增量,被AI“香”到了的合合信息(688615.SH)赴港“再拼一把”?
智通财经网· 2025-07-10 10:51
Core Viewpoint - Company Hohhot Information is expanding its market presence by applying for a listing on the Hong Kong Stock Exchange after successfully launching on the Shanghai Stock Exchange's Sci-Tech Innovation Board in September 2022. The company's flagship product, "Scan All-in-One," has significantly contributed to its revenue growth and market position in the AI sector [1][2][11]. Financial Performance - The total revenue of Hohhot Information for the years 2022 to 2024 is projected to be RMB 988.46 million, RMB 1.186 billion, and RMB 1.438 billion respectively. In Q1 2023, the revenue reached RMB 395 million, marking a 20% increase from RMB 327 million in the same period last year [2][5]. - The revenue contribution from the "Scan All-in-One" product is expected to rise from 72.3% in 2022 to 81.1% in Q1 2024, indicating its dominance in the company's product lineup [1][3]. Product Breakdown - The revenue from C-end products, including "Scan All-in-One," "Business Card All-in-One," and "Qixinbao," accounted for 82.2%, 84.3%, and 83.8% of total revenue from 2022 to 2024. In Q1 2023, this segment's contribution increased to 86.6% [3][5]. - The "Scan All-in-One" product's revenue is projected to grow from RMB 714.57 million in 2022 to RMB 1.112 billion in 2024, while the "Business Card All-in-One" and "Qixinbao" products have shown mixed performance [3][5]. Market Position and Competitive Advantage - Hohhot Information ranks first in China and fifth globally in terms of revenue from C-end efficiency AI products with over 100 million monthly active users [1][2]. - The company has leveraged advancements in AI, particularly in optical character recognition (OCR) technology, achieving a character recognition rate of 81.9% in complex scenarios, which is significantly higher than competitors [7][8]. Future Growth Potential - The company plans to enhance its product offerings by developing new features such as RPA (Robotic Process Automation) and expanding its global market presence, particularly in emerging markets like Brazil, Indonesia, and Mexico, where the conversion rate is currently below one-fourth of that in China [10][11]. - Hohhot Information aims to establish a global technical support center and expand its marketing network to improve brand influence and customer service [10].
国网蒙东电力优化资金应用,进一步提升资金精益管理水平
Core Insights - The launch of the Smart Shared Financial Platform by State Grid Inner Mongolia Electric Power marks significant improvements in fund management efficiency, payment security, and integration of finance and operations [1] Group 1: Financial Platform Applications - The platform has optimized the entire lifecycle management of bank accounts, automating the collection of bank transaction data and reconciling nearly 130,000 transactions [1] - It has achieved comprehensive processing of fund settlement workflows, managing over 140,000 payments and nearly 40,000 electricity revenue receipts [1] - The management of bills has been enhanced, with nearly 1,000 bank acceptance bills initialized through online connectivity with relevant systems [1] - A standardized investment and financing management model has been established, processing nearly 100 financing transactions from initiation to repayment [1] Group 2: Future Developments - The company plans to leverage OCR and AI technologies for intelligent fund auditing, aiming to transition from manual experience-driven processes to data-driven solutions [2]