高质量数据集

Search documents
人工智能高质量数据集生态发展大会在重庆永川举行
Xin Hua Wang· 2025-09-29 08:41
原标题:人工智能高质量数据集生态发展大会在重庆永川举行 西部数据标注研究院是由中国信息协会与永川区人民政府共同发起成立的数字技术共享平台、数字产业 孵化平台及数字生态构建平台。研究院将聚焦人工智能与数字重庆建设双向赋能,围绕人工智能、高质 量数据集、数据标注等领域,开展新兴技术科研创新、顶层设计、课题研究、标准制定、质量评测等业 务,并组建专家智库,培养复合型数据标注人才。未来,研究院还将探索建设数据领域科学实验室、技 术创新中心、企业技术中心等科技创新平台,加大对数据领域基础研究和前沿技术、原创性技术创新的 支持力度。其主要研究方向包括:开展高质量数据集建设应用研究,数据标注产业链、人才、技术方面 研究,数据标注场景综合试验等方面。 西部数据集生产基地由中国信息协会与永川区政府共建。协会依托会员企业资源,推动更多数据集生产 类企业落地永川,双方合力促成数据要素在永川汇集,打造基地以带动西部、辐射全国数据产业。 9月28日,人工智能高质量数据集生态发展大会在重庆市永川区举行。大会以"构建高质量数据集,赋能 AI新发展"为主题,聚焦数据标注产业实践与探索,通过政策宣介、案例分享、揭牌签约、产业对话等 多种形式, ...
国家数据局:加快行业高质量数据集建设
Zhong Guo Zheng Quan Bao· 2025-09-21 20:17
Group 1 - The National Bureau of Statistics emphasizes the importance of investing in data resources for manufacturing companies, similar to the focus on technology research and equipment updates [1][2] - High-quality data sets are identified as a new type of fuel for digital transformation in the manufacturing industry, crucial for the effective application of artificial intelligence technologies [1][2] - There is a call for increased investment in the data industry, including the cultivation of data resources, technology, services, applications, security, and infrastructure [2] Group 2 - The need for a market awareness of paying for high-quality data is highlighted, encouraging companies to treat data acquisition with the same importance as purchasing advanced equipment and high-end technology [2] - Companies are urged to avoid "involution" competition and instead focus on innovation and data-driven development, particularly for leading enterprises with strong digital capabilities [2]
国家数据局:加快行业 高质量数据集建设
Zhong Guo Zheng Quan Bao· 2025-09-21 20:15
Core Viewpoint - The National Bureau of Statistics emphasizes the importance of data investment in the manufacturing sector, advocating for a focus on data resource development akin to technology R&D and equipment upgrades [1][2] Group 1: Data Investment and Utilization - Manufacturing enterprises should increase investment in the entire data lifecycle, including collection, storage, computation, management, and utilization [2] - High-quality data sets are essential for the digital transformation of the manufacturing industry, serving as a new type of fuel for innovation and efficiency [1][2] Group 2: Market Awareness and Innovation - There is a need to cultivate a market awareness for paying for quality data, similar to purchasing advanced equipment and high-end technology [2] - Companies, especially leading firms with strong digital capabilities, should rely on innovation and data-driven strategies for growth, avoiding "involution" competition [2]
加快推动高质量数据集建设 助力构建开放共赢的数据生态
Zheng Quan Ri Bao Wang· 2025-09-16 12:18
在此次高质量数据集建设先行先试工作单位中,涉及金融服务的有证通股份有限公司申报的《资本市场 融资企业高质量数据集建设》,以及恒生电子(600570)股份有限公司申报的《面向金融行业大模型的 多模态高质量数据集建设》,这两项申报不仅代表了资本市场对数据资源的深度探索,更揭示了数据要 素驱动金融业态变革的重要价值。 高质量数据集通过整合企业研发投入、专利数据、供应链关系等多维度信息,构建动态化企业画像,能 够降低信息不对称问题,投资者和金融机构能够精准识别企业技术领先性与商业化潜力,进而提升对很 多轻资产、高成长性企业的风险评估能力。 高质量数据集是人工智能时代的"战略资源",尤其在金融、能源、交通等关键领域,数据集建设与治理 是保障产业链供应链韧性的重要基石。国家数据局在今年4月份发布的《全国数据资源调查报告(2024 年)》显示,2024年我国高质量数据集数量同比增长27.4%,有力支撑人工智能训练和应用。 中关村物联网产业联盟副秘书长袁帅对记者分析,数据质量是人工智能从"可用"向"好用"跨越的关键瓶 颈,高质量数据集建设先行先试工作通过"场景驱动+示范先行"策略,推动跨部门、跨行业数据协同, 通过政策引导与 ...
广东电网:“数字匠芯”筑就AI高质量数据基石
Zhong Guo Dian Li Bao· 2025-09-16 07:36
Core Insights - Data is recognized as the "fifth new production factor" driving high-quality economic and social development, with the integration of artificial intelligence and data expected to bring transformative changes [1] - Guangdong Power Grid's high-quality dataset has been awarded as a typical case by the National Data Bureau, indicating successful implementation in the AI data engineering field [1] Group 1: Data Quality and AI Integration - Guangdong Power Grid has established a high-standard data processing and labeling system, providing reliable data for AI projects, which enhances equipment intelligence and industry empowerment [1][2] - The AI models rely on precisely labeled data to understand operational states, predict trends, and identify safety hazards, making high-quality data essential for smart grid development [2][3] Group 2: Data Utilization and Impact - The construction of a high-quality dataset has enabled the transformation of 1.5 million multimodal power safety data into reusable digital assets, facilitating a shift from single-scene recognition to multidimensional risk prediction [4] - The implementation of this dataset has led to a 32% reduction in similar violations by optimizing operational processes based on high-frequency violation data identified by algorithms [4] Group 3: Industry Implications and Best Practices - The recognition of Guangdong Power Grid's dataset as a national-level typical case showcases its technical strength and provides a replicable model for integrating AI with the real economy [5] - The case illustrates the importance of a data governance framework that aligns with industry needs, addressing the common challenge of AI implementation due to a lack of high-quality data [5]
2025年中国数据要素行业发展研究报告
艾瑞咨询· 2025-09-14 00:07
Core Insights - Data, as the fifth production factor, has unique characteristics such as non-competitiveness, replicability, and infinite growth potential, making its value extraction process more complex than traditional production factors [1] - The development of a market for data elements relies heavily on a clear policy framework and implementation pathways, with local data trading institutions and data merchants becoming key drivers [1][2] - The domestic data element market is expected to grow at a compound annual growth rate (CAGR) of approximately 20.26%, surpassing 300 billion yuan by 2028 [6] Current Situation Analysis - The data element market system is gradually improving, driven by policy guidance and industrial construction, focusing on data, technology, and infrastructure [2] - The digital economy's core industries are becoming significant drivers of the overall economic system, with the digital economy scale increasing from 27.2 trillion yuan in 2017 to 53.9 trillion yuan in 2023, doubling in six years [6] Policy Analysis - The improvement of the policy framework for the data industry value chain and the establishment of local data systems are crucial for the circulation of data element value [4] Market Scale Assessment - The data element industry is projected to reach approximately 200 billion yuan by 2025 and exceed 300 billion yuan by 2028, with data processing and analysis being the largest segment [6] Data Value Chain Construction - The establishment of a data value circulation system is supported by advanced technology and regulatory compliance [8] - The construction of a data ownership system based on the "Data Twenty Articles" is essential for efficient data value circulation [11] Data Registration - Data registration is critical for asset ownership delineation and promoting data value release, with a "1+3" policy framework guiding public data resource management [13] Data Value Assessment - The data valuation policy framework is becoming more refined, with public data resource quantification standards emerging as important benchmarks [16] Data Asset Capitalization - The capitalization of data assets is a core practice for realizing data value, with the implementation of regulations marking a new era for data asset inclusion in financial statements starting January 1, 2024 [19] Data Asset Trading - The data market exhibits a distribution pattern of "internal cold, external hot," with off-market transactions dominating due to their flexibility and customization [21] Industry Practices - The financial sector is expected to see a CAGR of approximately 19.06%, reaching over 100 billion yuan by 2028, driven by data element integration [31] - The industrial manufacturing sector is projected to grow at a CAGR of about 24.22%, with a focus on high-quality data sets and trusted data spaces [34] - The healthcare industry is anticipated to grow at a CAGR of around 23.69%, emphasizing the compliance of personal health data applications [36] Trends - High-quality data set construction is becoming a key factor in advancing the artificial intelligence industry, transitioning from "point breakthroughs" to "holistic development" [39] - The establishment of trusted data spaces will be crucial for ensuring the circulation and high-value application of data elements [42]
助力数据要素价值释放!烟台市启动高质量数据集征集工作
Qi Lu Wan Bao Wang· 2025-09-10 11:36
Group 1 - The Yantai Big Data Bureau has launched a high-quality data set collection initiative to implement the "Data Element × Three-Year Action Plan (2024-2026)" [1] - High-quality data sets are considered the "new oil" for developing artificial intelligence and promoting industrial intelligent upgrades, with significant strategic importance in the digital economy [1] - The collection covers 12 key industries including industrial manufacturing, modern agriculture, financial services, and 5 emerging fields such as low-altitude economy and intelligent driving [1] Group 2 - Yantai has made positive progress in high-quality data set construction, with representative projects like the "National 76 Industry Pollutant Discharge Permit Digital Chain Data Set" and "National 138 City All-Media Advertising Data Set" [2] - Ten projects from Yantai were included in the 2025 Shandong Province "Data Element ×" Innovation Application Project Subsidy List, ranking second in the province [2] - The initiative aims to discover exemplary high-quality data sets to promote data circulation and value release, injecting new momentum into digital transformation and high-quality development [2]
江苏绘就数据“蓝图”
Guo Ji Jin Rong Bao· 2025-08-30 16:36
Core Insights - Jiangsu Province aims to build at least 1,000 high-quality data sets by the end of 2027, as outlined in the "Implementation Plan for the Development of the Data Annotation Industry and High-Quality Data Set Construction (2025-2027)" [1][2] Group 1: Development Goals - The plan sets ambitious targets for the data annotation industry, expecting a significant improvement in precision, specialization, intelligence, and systematization by 2027 [2] - The industry is projected to account for over 10% of the national market share, with an annual compound growth rate exceeding 20% [2] Group 2: Industry Structure - Jiangsu will establish three data annotation bases and cultivate around ten key enterprises with strong innovation and industry influence [2] - The initiative aims to create an industrial cluster effect, optimizing resource allocation and reducing operational costs for companies [2] Group 3: High-Quality Data Sets - The construction of 1,000 high-quality data sets will cover 17 key areas, including transportation, healthcare, financial services, cultural tourism, and education [3] - The focus on autonomous driving highlights the importance of high-quality data sets for enhancing safety and reliability in smart transportation [3] Group 4: Application Cases - The plan includes selecting 100 replicable and promotable application cases to serve as models for the data annotation industry [6] - Successful cases in various fields, such as healthcare and cultural tourism, will provide valuable insights and methods for data collection, annotation, and application [6]
国家数据局派发高质量数据集建设先行“工单”
Xin Hua Ri Bao· 2025-08-28 21:30
Group 1 - The National Data Bureau has launched a new batch of high-quality data set construction pilot tasks, with four projects from Jiangsu province selected, including the Xinhua News Media Cultural Industry High-Quality Data Set Construction Project [1] - The focus is on integrating artificial intelligence with various sectors to reshape production and living paradigms, emphasizing the need for high-quality data sets [1][2] - The selected projects aim to address the challenges of data fragmentation and inefficiency in the media and healthcare sectors, promoting safe and compliant data circulation [2][3] Group 2 - The Xinhua News Media project will implement a "1+3+10+N" framework to meet high-quality data set construction needs and tackle the media industry's data issues [2] - The healthcare project will pilot in six key areas, including cardiovascular and cancer management, providing a benchmark for national healthcare data governance [2] - The total investment for the high-quality multimodal ultrasound medical data set project is 80 million yuan, aiming to enhance AI model training efficiency and clinical application reliability [2]
涉及交通、医疗、教育……江苏到2027年底建设不少于1000个高质量数据集
Yang Zi Wan Bao Wang· 2025-08-28 12:29
Core Viewpoint - Jiangsu Province has launched a comprehensive plan to develop a high-quality data labeling industry, aiming to establish at least 1,000 complete and practical high-quality datasets by the end of 2027, significantly impacting various sectors of daily life and enhancing economic development [1][2]. Group 1: Development Goals - The plan outlines a vision for the data labeling industry in Jiangsu, targeting a national market share exceeding 10% and an annual compound growth rate of over 20% by 2027 [2]. - The initiative includes the establishment of three data labeling bases and the cultivation of around 10 key enterprises with strong innovation and industry influence [2]. Group 2: Key Areas of Focus - The first batch of high-quality datasets will cover 17 critical areas closely related to daily life, including transportation, healthcare, education, and more, aiming to support the development of artificial intelligence [1][4]. - Specific applications include enhancing autonomous driving through datasets focused on road perception and intelligent traffic management, as well as improving ride-hailing services with data on smart dispatch and trip recording [3][4]. Group 3: Impact on Healthcare - High-quality datasets in healthcare will include data on various diseases, facilitating advancements in drug development and clinical research, while also supporting intelligent regulation of medical insurance funds and pharmaceutical services [4]. - The plan emphasizes collaboration with large model enterprises, data service providers, and research institutions to leverage these datasets for AI model training and innovation [4].