Core Insights - The "2025 National High-Quality Data Set and Data Annotation Industry Supply and Demand Matching Conference" held in Nanjing successfully attracted over 500 companies and resulted in more than 90 collaborations with a transaction value exceeding 900 million yuan [1] - Jiangsu province aims to leverage its rich data resources to enhance the construction of high-quality data sets, which is essential for seizing opportunities in artificial intelligence development [1][2] - The definition of high-quality data sets varies across industries, but they must meet the training needs of AI large models [2] Industry Overview - Jiangsu has established 321 high-quality data sets across key sectors such as healthcare, transportation, industry, energy, and cultural tourism, with a total data scale exceeding 93PB [1] - The province has implemented a "1+N" policy framework to optimize the environment for artificial intelligence development, focusing on collaboration between supply and demand enterprises [2][7] Challenges in Data Annotation - Data annotation is crucial for AI development, requiring specialized knowledge and skills, particularly in complex fields like medical data [3][4] - The industry faces challenges such as insufficient data supply and a lack of skilled data annotators, which hinder the progress of large models in niche areas [4] Cost Considerations - The high costs associated with data storage and processing are significant challenges for companies, with many high-quality data sets being discarded due to storage expenses [5][6] - Companies are exploring solutions like establishing cold storage centers in less developed regions to reduce costs associated with data storage [5] Financial Support and Standards - The data industry is knowledge and capital-intensive, with a significant portion of costs tied to acquiring raw data [6] - Financial institutions are encouraged to provide support for data collection and annotation, potentially through innovative financing models [6] - The establishment of standards for high-quality data sets is underway, with guidelines and quality assessment protocols being developed to address current challenges [6]
建设高质量数据集,江苏势在必行、必须先行
Xin Hua Ri Bao·2025-11-06 08:16