MonkeyOCR v1.5
Search documents
从“模型为王”到“数据为基”:WPS 365如何帮企业挖掘数据金矿?
Xin Lang Cai Jing· 2026-02-03 11:33
来源:连线Insight 编辑/子夜 "即便是性能卓越的'神模',12个月后用户留存率也可能降至较低水平。" 2026年1月27日,WPS 365 AI协同办公峰会在上海举办。会上,中金公司研究部执行总经理、计算机行业首席分析师于钟海如此说道。 这番话放在三年前,可能没人会信。 但在2026年初,这个问题的答案变得越来越不重要。模型正在变成基础设施,真正的竞争焦点已悄然转移。 当大模型的上下文有限,模型能力趋同,那么toB AI的竞争实质,其实是效率竞争,是"谁能为AI提供更丰富、更准确,可被理解的上下文"。 这种情况下,企业数据的重要性会大幅拔高。 连线Insight在峰会现场观察到,延锋国际、东方航空、上海信投等华东龙头企业分享的落地案例指向同一个结论:AI项目从Demo到上线,最大障碍不是 算力或模型,而是如何让散落各处、格式混乱的企业文档真正被AI理解。 企业级AI应用的竞争重心,正从"模型能力"转向"数据治理"。 而在这场数据竞争中,一个容易被忽视的技术环节正在成为关键:非结构化数据的解析,尤其是复杂文档的解析与知识化能力,直接决定了企业数据资产 的质量上限。 WPS 365 统合进行知识治理,图源 ...
金山与华科发布多模态模型MonkeyOCR v1.5:文档解析能力超越PaddleOCR-VL,复杂表格解析首次突破90%
量子位· 2025-11-18 05:02
Core Insights - The article discusses the advancements in the field of multi-modal document parsing, highlighting the release of MonkeyOCR v1.5, which significantly improves upon previous OCR systems in handling complex documents [2][29]. Group 1: Importance of Enhanced Document Parsing - The need for stronger document parsing engines is emphasized, particularly for extracting information from complex layouts, nested tables, and multi-page documents [4][5]. - Traditional OCR systems struggle with intricate document structures, leading to errors in data extraction [5]. Group 2: MonkeyOCR v1.5 Breakthroughs - MonkeyOCR v1.5 introduces a unified visual-language document parsing framework that outperforms previous models by 9.7% in challenging scenarios [2][18]. - The core design philosophy of v1.5 is to decouple global structural understanding from fine-grained content recognition, incorporating innovative algorithms for complex tasks [7][29]. Group 3: Two-Stage Parsing Pipeline - The parsing process is streamlined into two stages: layout analysis and reading order prediction, followed by region-level content recognition, enhancing both accuracy and efficiency [8][9]. - The first stage utilizes a visual language model to predict document layout and reading order, reducing errors from the outset [8]. - The second stage processes each identified region in parallel, ensuring high precision in recognizing text, formulas, and tables [9]. Group 4: Techniques for Complex Table Parsing - MonkeyOCR v1.5 employs three key strategies for understanding complex tables: visual consistency reinforcement learning, image decoupling for table parsing, and type-guided table merging [11][16]. - The visual consistency reinforcement learning approach allows the model to self-optimize without extensive manual labeling, improving parsing fidelity [11]. - The image decoupling method effectively handles embedded images in tables, ensuring accurate structure recognition [14]. - The system intelligently merges cross-page tables by defining common patterns and using a hybrid decision-making process [16]. Group 5: Performance Metrics - In the OmniDocBench v1.5 benchmark, MonkeyOCR v1.5 achieved an overall score of 93.01%, surpassing previous best models like PPOCR-VL and MinerU2.5 [18][19]. - On the OCRFlux-complex dataset, it scored 90.9%, outperforming PPOCR-VL by 9.2%, demonstrating its superior capability in handling complex structures [18][20]. Group 6: Visual Comparisons and Real-World Applications - The article provides visual comparisons showcasing v1.5's ability to accurately identify layout elements and restore embedded images, which other models often fail to do [21][25]. - The system effectively reconstructs cross-page tables, eliminating structural interruptions caused by headers and footers [29]. Group 7: Conclusion and Future Outlook - MonkeyOCR v1.5 addresses core pain points in document parsing within real industrial scenarios, offering a robust and efficient solution for complex document understanding tasks [29].