Workflow
MinerU
icon
Search documents
OpenDataLab将与钉钉打造免费全能的文档解析神器
Ge Long Hui· 2025-09-04 11:28
Core Insights - High-quality data is essential for AI model training and application, serving as the "fuel" for enterprises transitioning to AI [2][3] - OpenDataLab and DingTalk have launched DLU, a document parsing tool aimed at helping enterprises overcome AI-Ready data challenges [2][3] Group 1: Product and Technology - DLU is based on MinerU, an intelligent document parsing engine developed by OpenDataLab, which has gained over 40,000 stars on GitHub due to its precise parsing capabilities [2][3] - MinerU 2.0 has improved parsing speed and accuracy, achieving performance comparable to mainstream models with 72 billion parameters using only 0.98 billion parameters [3] - DLU supports various document formats, including Office documents, PDFs, Markdown, and DingTalk's proprietary formats, enabling the extraction of complex visual elements for high-quality data suitable for model training [3][4] Group 2: Market Position and Strategy - OpenDataLab is recognized as a leading AI data platform in China, providing over 2 million data retrieval services to more than 100,000 users [3] - DingTalk, as a part of Alibaba Group, has a strong enterprise user base and has integrated MinerU capabilities into its document products, laying a solid foundation for DLU's development [3][4] - The open-source DLU aims to address data preparation challenges faced by enterprises in the AI era, supporting a full-cycle process from document creation to customized model training [4]