高质量数据集建设
Search documents
我国已建成高质量数据集总量超500PB 有望成为撬动经济增长新引擎
Zheng Quan Ri Bao Wang· 2025-12-05 12:03
国家数据局近期公布数据显示,我国已建成高质量数据集总量超500PB(拍字节,计算机存储容量单 位),赋能人工智能模型不断提升性能、创新持续加速。 国家数据局局长刘烈宏在2025科创大会上表示,高质量数据集是数智创新的关键资源,国家数据局联合 26个部委共同制定政策文件,以场景应用为导向,推动各行业领域高质量数据集建设。 在数字化浪潮席卷全球的当下,数据已成为驱动社会经济发展的核心生产要素之一,数据如同工业时代 的石油,为人工智能、大数据分析、机器学习等前沿技术提供源源不断的动力。高质量数据集作为数据 的精华集合,其建设工作也成为国家数据局今年的重点之一。 今年2月份,国家数据局在北京召开高质量数据集建设工作启动会,提出加快推动形成一批标志性成 果,赋能行业高质量发展;6月份,国家数据局综合司发布《关于征集高质量数据集典型案例的通 知》,总结各地区、各领域在高质量数据集建设、运营和应用过程中取得的成效,遴选推广先进做法; 8月份,在2025中国国际大数据产业博览会上,国家数据局启动高质量数据集建设先行先试工作,在国 家数据局指导下,发布了《高质量数据集建设指引》。 中国移动(600941)通信联合会教育与科学技 ...
石化化工行业高质量数据集建设倡议发布
Zhong Guo Hua Gong Bao· 2025-11-07 02:10
Core Viewpoint - The initiative aims to establish a high-quality data set for the petrochemical industry to support the application of artificial intelligence and promote the industry's transformation towards high-end, green, and intelligent development [1] Group 1: Initiative Details - The initiative was launched at the 2025 Petrochemical Industry Digital Transformation Conference in Dalian, Liaoning, by 36 organizations including the Dalian Institute of Chemical Physics and the Petrochemical Industry Digital Transformation Promotion Center [1] - It emphasizes that data quality is crucial for the level of intelligent development, and key data within industry scenarios is the most vital resource [1] - The initiative calls for collaboration among enterprises, universities, and research institutions in the petrochemical sector to build a national-level data asset system that covers the entire industry chain and integrates all production factors [1] Group 2: Goals and Vision - The initiative aims to break down "data silos" and construct a complete industrial ecosystem through full-chain collaboration [1] - It advocates for a shared development approach to consolidate efforts, supporting the implementation of new industrialization tasks and contributing to the construction of a manufacturing powerhouse and the realization of Chinese-style modernization [1]
全国高质量数据集和数据标注产业供需对接大会在宁举行
Nan Jing Ri Bao· 2025-11-06 03:33
Core Insights - The conference held on November 5 focused on high-quality data sets and data annotation, showcasing the vibrant market potential with over 500 participating companies and more than 90 cooperation agreements, resulting in a total transaction amount exceeding 900 million yuan [1][2] Group 1: Conference Overview - The event was guided by the National Bureau of Statistics and aimed to promote high-quality data set construction as a fuel for artificial intelligence [1] - The conference featured five industry-specific sessions, including healthcare, smart energy, industrial manufacturing, transportation, and cultural tourism, addressing data needs in large model training and industry digital transformation [1] Group 2: Industry Participation and Contributions - Nine representative companies, including JD Technology, Alibaba Cloud, and Huawei Cloud, shared practical applications of high-quality data sets across various sectors such as e-commerce, cloud computing, and finance [2] - Key figures from national research institutions discussed the construction paths, standards, and evaluation criteria for high-quality data sets [2] Group 3: Local Development and Innovation - Nanjing has positioned high-quality data set construction as a core strategy to activate data value and support AI development, achieving full coverage in the 16 industry areas identified by the National Bureau of Statistics [3] - The city has developed a unique path for high-quality data set construction, characterized by policy guidance, innovation-driven approaches, industry clustering, rich applications, and open cooperation [3]
2025年中国数据要素行业发展研究报告
艾瑞咨询· 2025-10-26 00:05
Core Insights - Data is recognized as the fifth production factor, with its value extraction process being more complex than traditional production factors due to its non-competitive, replicable, and infinite growth characteristics [1] - The development of a market for data elements is increasingly reliant on a clear policy framework and implementation pathways, with local data trading institutions and data merchants becoming central to this evolution [1][2] - The domestic data element market is expected to grow at a compound annual growth rate (CAGR) of approximately 20.26%, surpassing 300 billion yuan by 2028 [6] Current Status of the Data Element Industry - The data element market system is gradually improving, driven by policy guidance and industrial construction, focusing on data, technology, and infrastructure [2] - The digital economy's core industries are becoming significant drivers of the overall economic system, with the digital economy scale growing from 27.2 trillion yuan in 2017 to 53.9 trillion yuan in 2023, doubling in six years [6] Policy Analysis - The improvement of the policy framework for the data industry value chain and the establishment of local data systems are crucial for the circulation of data element value [4] - The introduction of the "Data Twenty Articles" policy has initiated the gradual improvement of the data element rights system, which is essential for efficient data value circulation [11] Market Size Estimation - The domestic data element industry is projected to reach approximately 200 billion yuan by 2025 and exceed 300 billion yuan by 2028, with data processing and analysis being the largest segment [6] Data Value Chain Circulation - The establishment of a data value circulation system is supported by advanced technology capabilities and regulatory compliance [8] - The data asset registration process is critical for ensuring the rights and interests of data participants, with a focus on public data as a core resource [13] Data Asset Capitalization - The entry of data assets into financial statements marks a significant step in the capitalization of data elements, with the implementation of relevant regulations starting in 2024 [19] - The market for data asset transactions is characterized by a "cold inside, hot outside" distribution pattern, with off-market transactions dominating due to their flexibility and customization [21] Industry Practices - The financial sector is expected to see a CAGR of approximately 19.06%, reaching over 100 billion yuan by 2028, driven by the integration of data elements [31] - The industrial manufacturing sector is projected to grow at a CAGR of about 24.22%, with a focus on high-quality data sets and trusted data spaces [34] Trends - The construction of high-quality data sets is becoming a key factor in advancing the artificial intelligence industry, with a focus on systematic data collection and processing [39] - The establishment of trusted data spaces is essential for ensuring the safe circulation and high-value application of data elements, with plans for over 100 such spaces by 2028 [42]
2025年中国数据要素行业发展研究报告
艾瑞咨询· 2025-09-27 00:05
Core Insights - Data is recognized as the fifth production factor, with its value extraction process being more complex than traditional production factors due to its non-competitive, replicable, and infinite growth characteristics [1] - The development of a market-oriented system, represented by local data trading institutions and data merchants, is becoming the core driver for the growth of the data factor market [1][2] - The establishment of a clear policy framework and implementation path is crucial for enhancing the value of data elements, aiming for a well-functioning ecosystem of data supply and usage [1][4] Current Situation Analysis - The data factor market system is gradually improving, driven by policy guidance and industrial construction, focusing on data, technology, and infrastructure [2] - The digital economy's core industries are becoming significant drivers for the overall economic development in China, with the data factor market expected to grow at a compound annual growth rate (CAGR) of approximately 20.26% to exceed 300 billion by 2028 [6] Policy Analysis - The improvement of the policy framework for the data industry value chain and the establishment of local data systems are essential for the circulation of data factor value [4] Market Size Calculation - China's digital economy has grown from 27.2 trillion in 2017 to 53.9 trillion in 2023, with a CAGR of about 12.07% [6] - The data processing segment, focusing on data processing and analysis, is expected to become the largest sub-industry within the data factor market, reaching approximately 144 billion by 2028 [6] Data Value Chain Circulation - The establishment of a data ownership system based on the "Data Twenty Articles" is crucial for ensuring efficient circulation of data value [11] - Data registration is essential for asset ownership division and promoting data value release, with a "1+3" policy framework guiding public data resource management [13] - The data valuation policy framework is becoming more refined, with public data resource quantification standards emerging as important benchmarks [16] Capitalization of Data Assets - The entry of data assets into financial statements marks a significant step in the capitalization of data elements, with regulations coming into effect in 2024 [19] - The market for data asset transactions is characterized by a "cold inside, hot outside" distribution pattern, with off-market transactions dominating due to their flexibility and customization [21] Industry Practices - The financial sector is expected to see a CAGR of approximately 19.06%, reaching over 100 billion by 2028, driven by the integration of diverse data [30][31] - The industrial manufacturing sector is projected to grow at a CAGR of about 24.22%, with a focus on high-quality data sets and trusted data spaces [34] - The healthcare sector's data element scale is expected to grow steadily, with a CAGR of about 23.69%, emphasizing the importance of data compliance and security [36] Trends - High-quality data sets are becoming key to driving the artificial intelligence industry, with a shift from "single-point breakthroughs" to "holistic development" [39][40] - The construction of trusted data spaces will be crucial for ensuring the circulation and high-value application of data elements [42]
加快推动高质量数据集建设 助力构建开放共赢的数据生态
Zheng Quan Ri Bao Wang· 2025-09-16 12:18
Core Insights - The National Bureau of Statistics has initiated a pilot program for high-quality data set construction to support the "Artificial Intelligence +" action plan, identifying 140 pilot units across 25 provinces in China [1][2] - High-quality data sets are deemed a strategic resource in the AI era, crucial for enhancing the resilience of supply chains in key sectors such as finance, energy, and transportation [2][3] - The construction of high-quality data sets is expected to grow by 27.4% year-on-year in 2024, significantly supporting AI training and applications [2] Group 1: Pilot Program and Implementation - The pilot program focuses on four key tasks: testing technology, support, standards, and mechanisms, with ongoing monitoring and evaluation of project progress [1] - The program covers 18 key sectors, including scientific research, industrial manufacturing, agriculture, smart energy, transportation, financial services, healthcare, education, business, cultural tourism, emergency management, and meteorological services [1] Group 2: Market Impact and Opportunities - The pilot program aims to create a market environment that values high-quality data, facilitating the transformation of data from a resource to an asset [2] - Companies selected for the pilot program, such as Zhengtong Co. and Hengsheng Electronics, are exploring deep data resource integration, enhancing their digital leadership in the financial sector [2] Group 3: Regulatory and Financial Implications - High-quality data sets help reduce information asymmetry, enabling investors and financial institutions to better assess the technological leadership and commercialization potential of companies [3] - These data sets provide regulatory bodies with tools for real-time monitoring and compliance, enhancing regulatory efficiency while balancing data sharing and privacy protection [3] - The development of high-quality data sets is seen as a cornerstone for fair and transparent capital markets, driving the digital transformation of the financial industry [3]
2025年中国数据要素行业发展研究报告
艾瑞咨询· 2025-09-14 00:07
Core Insights - Data, as the fifth production factor, has unique characteristics such as non-competitiveness, replicability, and infinite growth potential, making its value extraction process more complex than traditional production factors [1] - The development of a market for data elements relies heavily on a clear policy framework and implementation pathways, with local data trading institutions and data merchants becoming key drivers [1][2] - The domestic data element market is expected to grow at a compound annual growth rate (CAGR) of approximately 20.26%, surpassing 300 billion yuan by 2028 [6] Current Situation Analysis - The data element market system is gradually improving, driven by policy guidance and industrial construction, focusing on data, technology, and infrastructure [2] - The digital economy's core industries are becoming significant drivers of the overall economic system, with the digital economy scale increasing from 27.2 trillion yuan in 2017 to 53.9 trillion yuan in 2023, doubling in six years [6] Policy Analysis - The improvement of the policy framework for the data industry value chain and the establishment of local data systems are crucial for the circulation of data element value [4] Market Scale Assessment - The data element industry is projected to reach approximately 200 billion yuan by 2025 and exceed 300 billion yuan by 2028, with data processing and analysis being the largest segment [6] Data Value Chain Construction - The establishment of a data value circulation system is supported by advanced technology and regulatory compliance [8] - The construction of a data ownership system based on the "Data Twenty Articles" is essential for efficient data value circulation [11] Data Registration - Data registration is critical for asset ownership delineation and promoting data value release, with a "1+3" policy framework guiding public data resource management [13] Data Value Assessment - The data valuation policy framework is becoming more refined, with public data resource quantification standards emerging as important benchmarks [16] Data Asset Capitalization - The capitalization of data assets is a core practice for realizing data value, with the implementation of regulations marking a new era for data asset inclusion in financial statements starting January 1, 2024 [19] Data Asset Trading - The data market exhibits a distribution pattern of "internal cold, external hot," with off-market transactions dominating due to their flexibility and customization [21] Industry Practices - The financial sector is expected to see a CAGR of approximately 19.06%, reaching over 100 billion yuan by 2028, driven by data element integration [31] - The industrial manufacturing sector is projected to grow at a CAGR of about 24.22%, with a focus on high-quality data sets and trusted data spaces [34] - The healthcare industry is anticipated to grow at a CAGR of around 23.69%, emphasizing the compliance of personal health data applications [36] Trends - High-quality data set construction is becoming a key factor in advancing the artificial intelligence industry, transitioning from "point breakthroughs" to "holistic development" [39] - The establishment of trusted data spaces will be crucial for ensuring the circulation and high-value application of data elements [42]
2025年中国数据要素行业发展研究报告
艾瑞咨询· 2025-08-30 00:06
Core Insights - Data, as the fifth production factor, has unique characteristics such as non-competitiveness, replicability, and infinite growth potential, making its value extraction process more complex than traditional production factors [1] - The development of a market for data elements relies heavily on a clear policy framework and implementation pathways, with local data trading institutions and data merchants becoming key drivers [1][2] - The integration of government and industry is essential for establishing a robust ecosystem for data supply and usage, aiming for a phased goal of effective supply, fluid movement, good utilization, and security [1] Current Status of the Data Element Industry - The data element market system is gradually improving, driven by policy guidance and industrial construction, focusing on data, technology, and infrastructure [2] Policy Analysis - The improvement of the policy framework for the data industry value chain and the establishment of local data systems are crucial for the circulation of data element value [4] Market Scale Estimation - The domestic data element market is expected to grow at a compound annual growth rate (CAGR) of approximately 20.26%, surpassing 300 billion yuan by 2028 [6] - The digital economy's core industries are projected to contribute significantly to the overall economic development, with the digital economy scale increasing from 27.2 trillion yuan in 2017 to 53.9 trillion yuan in 2023, reflecting a CAGR of about 12.07% [6] Data Value Chain Construction - The construction of a data value circulation system is supported by advanced technology and regulatory compliance [8] Data Compliance and Rights Confirmation - The establishment of a data ownership system based on the "Data Twenty Articles" is crucial for ensuring efficient circulation of data value [11] - The legal framework for data rights confirmation is expected to evolve, addressing challenges such as data classification and compliance standards [11] Data Registration - Data registration is essential for asset ownership division and promoting data value release, with a "1+3" policy framework guiding public data resource management [13] Data Value Assessment - The data valuation policy framework is becoming more refined, with public data resource quantification standards emerging as important benchmarks [16] Data Asset Inclusion in Financial Statements - The inclusion of data assets in financial statements marks a significant step towards capitalizing data elements, with regulations coming into effect in 2024 [19] Data Asset Trading - The data market exhibits a "cold inside, hot outside" distribution pattern, with off-market trading dominating due to its flexibility and customization [21] Capitalization of Data Assets - Capitalization of data assets is becoming a core method for value release, optimizing the asset-liability structure of data-intensive enterprises [23] Data Asset Tokenization - Data asset tokenization represents the highest level of data value application, integrating physical asset digitization with digital asset monetization [25] Industry Practice: Market Size Breakdown - Data resource-intensive industries are central to the data element market, with finance and internet sectors collectively holding about half of the market share [28] Practical Scenarios: Financial Industry - The financial sector is expected to see a CAGR of approximately 19.06%, reaching over 100 billion yuan by 2028, driven by data element integration [31] Practical Scenarios: Industrial Manufacturing - The industrial manufacturing sector is projected to grow at a CAGR of about 24.22%, driven by the demand for high-quality data and cross-industry data resource sharing [34] Practical Scenarios: Healthcare Industry - The healthcare sector's data element scale is expected to grow at a CAGR of approximately 23.69%, surpassing 25 billion yuan by 2028 [36] Trends: High-Quality Data Set Construction - High-quality data sets are becoming key to driving AI industry development, with a focus on systematic data collection and processing [39] Trends: Trusted Data Space Construction - The establishment of trusted data spaces is essential for ensuring the secure circulation and high-value application of data elements [42]
国家数据局派发高质量数据集建设先行“工单”我省四单位“揭榜挂帅”
Xin Hua Ri Bao· 2025-08-28 23:10
Group 1 - The National Data Bureau has launched a new batch of high-quality data set construction pilot tasks, focusing on the integration of artificial intelligence with various industries to reshape production and lifestyle paradigms [1][2] - Four projects from Jiangsu Province, including the Xinhua News Media Cultural Industry High-Quality Data Set Construction, have been selected for this initiative [1][2] - The projects aim to address the challenges of data fragmentation and inefficiency in the media industry, ensuring data security, compliance, and effective circulation [2][3] Group 2 - The healthcare high-quality data set construction project will focus on six key areas, including cardiovascular diseases and cancer management, serving as a benchmark for national healthcare data governance and intelligent applications [2][3] - The high-quality multimodal ultrasound medical data set project has a total investment of 80 million yuan, aiming to create the first standardized and operational industry-level medical imaging data set in the ultrasound field [2] - The energy-saving photovoltaic integrated comprehensive energy high-quality data set project will transition from a traditional experience-driven model to a data-driven, globally optimized intelligent system [2][3]
国家数据局派发高质量数据集建设先行“工单”
Xin Hua Ri Bao· 2025-08-28 21:30
Group 1 - The National Data Bureau has launched a new batch of high-quality data set construction pilot tasks, with four projects from Jiangsu province selected, including the Xinhua News Media Cultural Industry High-Quality Data Set Construction Project [1] - The focus is on integrating artificial intelligence with various sectors to reshape production and living paradigms, emphasizing the need for high-quality data sets [1][2] - The selected projects aim to address the challenges of data fragmentation and inefficiency in the media and healthcare sectors, promoting safe and compliant data circulation [2][3] Group 2 - The Xinhua News Media project will implement a "1+3+10+N" framework to meet high-quality data set construction needs and tackle the media industry's data issues [2] - The healthcare project will pilot in six key areas, including cardiovascular and cancer management, providing a benchmark for national healthcare data governance [2] - The total investment for the high-quality multimodal ultrasound medical data set project is 80 million yuan, aiming to enhance AI model training efficiency and clinical application reliability [2]