Workflow
高质量数据集
icon
Search documents
未来产业:数据要素和交易
2025-11-07 01:28
Summary of Key Points from Conference Call Industry Overview - The conference call discusses the rapid development of China's digital economy and the importance of data elements, with the establishment of the National Data Bureau in 2022 to promote data resource development and utilization, aiming to create a unified national market for data circulation, trading, and rights confirmation [1][2][4] Core Insights and Arguments - Data is recognized as a strategic resource and one of the three key elements for the development of the artificial intelligence (AI) industry [2][9] - The data labeling industry is experiencing rapid growth, with the establishment of seven major data labeling bases and a total dataset size of 29TB, generating over 8.3 billion yuan in related output [1][6] - The average daily token consumption in AI applications increased over 300 times from early 2025 to June 2025, indicating a significant growth in AI application scale [1][8] - High-quality Chinese datasets are crucial, with over 60% of data used in large model training being in Chinese, highlighting the potential for developing high-quality Chinese datasets [1][9] Important but Overlooked Content - The government has implemented several measures to promote the digital economy, including policy documents and the establishment of dedicated institutions, such as the National Data Bureau and the three-year action plan for data education [5][6] - Public data is essential for improving social governance efficiency and promoting economic development, with a focus on creating a unified national market through data sharing across different sectors [12] - Future investment hotspots include the data exchange sector, AI applications, and specific application scenarios in healthcare, transportation, energy, and public utilities [8][11] - Regions like Guangdong, Shanghai, Fujian, and Zhejiang are expected to lead in digital economy development, with local state-owned enterprises playing a significant role [11]
建设高质量数据集,江苏势在必行、必须先行
Xin Hua Ri Bao· 2025-11-06 08:16
Core Insights - The "2025 National High-Quality Data Set and Data Annotation Industry Supply and Demand Matching Conference" held in Nanjing successfully attracted over 500 companies and resulted in more than 90 collaborations with a transaction value exceeding 900 million yuan [1] - Jiangsu province aims to leverage its rich data resources to enhance the construction of high-quality data sets, which is essential for seizing opportunities in artificial intelligence development [1][2] - The definition of high-quality data sets varies across industries, but they must meet the training needs of AI large models [2] Industry Overview - Jiangsu has established 321 high-quality data sets across key sectors such as healthcare, transportation, industry, energy, and cultural tourism, with a total data scale exceeding 93PB [1] - The province has implemented a "1+N" policy framework to optimize the environment for artificial intelligence development, focusing on collaboration between supply and demand enterprises [2][7] Challenges in Data Annotation - Data annotation is crucial for AI development, requiring specialized knowledge and skills, particularly in complex fields like medical data [3][4] - The industry faces challenges such as insufficient data supply and a lack of skilled data annotators, which hinder the progress of large models in niche areas [4] Cost Considerations - The high costs associated with data storage and processing are significant challenges for companies, with many high-quality data sets being discarded due to storage expenses [5][6] - Companies are exploring solutions like establishing cold storage centers in less developed regions to reduce costs associated with data storage [5] Financial Support and Standards - The data industry is knowledge and capital-intensive, with a significant portion of costs tied to acquiring raw data [6] - Financial institutions are encouraged to provide support for data collection and annotation, potentially through innovative financing models [6] - The establishment of standards for high-quality data sets is underway, with guidelines and quality assessment protocols being developed to address current challenges [6]
全国高质量数据集和数据标注产业供需对接大会在宁举行
Nan Jing Ri Bao· 2025-11-06 03:33
Core Insights - The conference held on November 5 focused on high-quality data sets and data annotation, showcasing the vibrant market potential with over 500 participating companies and more than 90 cooperation agreements, resulting in a total transaction amount exceeding 900 million yuan [1][2] Group 1: Conference Overview - The event was guided by the National Bureau of Statistics and aimed to promote high-quality data set construction as a fuel for artificial intelligence [1] - The conference featured five industry-specific sessions, including healthcare, smart energy, industrial manufacturing, transportation, and cultural tourism, addressing data needs in large model training and industry digital transformation [1] Group 2: Industry Participation and Contributions - Nine representative companies, including JD Technology, Alibaba Cloud, and Huawei Cloud, shared practical applications of high-quality data sets across various sectors such as e-commerce, cloud computing, and finance [2] - Key figures from national research institutions discussed the construction paths, standards, and evaluation criteria for high-quality data sets [2] Group 3: Local Development and Innovation - Nanjing has positioned high-quality data set construction as a core strategy to activate data value and support AI development, achieving full coverage in the 16 industry areas identified by the National Bureau of Statistics [3] - The city has developed a unique path for high-quality data set construction, characterized by policy guidance, innovation-driven approaches, industry clustering, rich applications, and open cooperation [3]
现场撮合交易额超9亿元!全国首场高质量数据集和数据标注产业供需对接大会在宁举办
Yang Zi Wan Bao Wang· 2025-11-05 13:41
Group 1 - The conference focused on the importance of high-quality datasets and data labeling for enhancing artificial intelligence capabilities, attracting over 500 companies and achieving more than 90 cooperation agreements with a total transaction amount exceeding 9 billion yuan [1] - The event featured discussions on data needs in various sectors, including healthcare, smart energy, industrial manufacturing, transportation, and cultural tourism, aiming to create a closed-loop system from data demand to supply and value realization [1][2] - A new public empowerment platform for high-quality dataset construction was launched, providing comprehensive lifecycle services for data collection, governance, annotation, and optimization, ensuring data quality through systematic evaluations [2] Group 2 - Nanjing has established a robust data industry ecosystem with over 3,000 data companies, focusing on secure cross-border data support for energy enterprises through advanced data circulation mechanisms [3] - The conference highlighted the shift from "model competition" to "data competition" in the AI industry, emphasizing the need for high-quality, scenario-specific data to overcome application bottlenecks in complex fields like agriculture and manufacturing [4] - The successful hosting of the conference demonstrated the vibrancy and innovation of China's high-quality data market, laying a solid foundation for a dual-driven industrial ecosystem of "artificial intelligence + data" [4]
人工智能高质量数据集生态发展大会在重庆永川举行
Xin Hua Wang· 2025-09-29 08:41
Core Insights - The conference focused on building high-quality datasets to empower AI development, emphasizing data labeling industry practices and innovations [1][6] - A partnership was established between the Chongqing Big Data Application Development Management Bureau and the Yongchuan District government to create a "Chongqing Data Set Construction Application Base" [3][4] - The West Data Labeling Research Institute and West Data Set Production Base were inaugurated to enhance digital technology sharing and data industry incubation [4][6] Group 1: Conference Highlights - The conference featured policy introductions, case sharing, and industry dialogues to promote AI data infrastructure and regional data innovation [1][6] - The Yongchuan District aims to enhance data labeling efficiency and usability to support the city's AI capabilities and business scenarios [3][6] Group 2: Strategic Initiatives - Yongchuan District signed cooperation projects with 12 companies, including major telecom operators and technology firms, to advance high-quality dataset construction and application [6][7] - The district plans to establish a data labeling industry park and implement a "data labeling + application" model to integrate digital and physical economies [6][7] Group 3: Future Goals - Yongchuan aims to become a hub for data element circulation and a data labeling service base by 2027, focusing on four key actions: building a data labeling industry park, creating a "data labeling +" ecosystem, implementing talent development initiatives, and promoting data value release [7]
国家数据局:加快行业高质量数据集建设
Group 1 - The National Bureau of Statistics emphasizes the importance of investing in data resources for manufacturing companies, similar to the focus on technology research and equipment updates [1][2] - High-quality data sets are identified as a new type of fuel for digital transformation in the manufacturing industry, crucial for the effective application of artificial intelligence technologies [1][2] - There is a call for increased investment in the data industry, including the cultivation of data resources, technology, services, applications, security, and infrastructure [2] Group 2 - The need for a market awareness of paying for high-quality data is highlighted, encouraging companies to treat data acquisition with the same importance as purchasing advanced equipment and high-end technology [2] - Companies are urged to avoid "involution" competition and instead focus on innovation and data-driven development, particularly for leading enterprises with strong digital capabilities [2]
国家数据局:加快行业 高质量数据集建设
Core Viewpoint - The National Bureau of Statistics emphasizes the importance of data investment in the manufacturing sector, advocating for a focus on data resource development akin to technology R&D and equipment upgrades [1][2] Group 1: Data Investment and Utilization - Manufacturing enterprises should increase investment in the entire data lifecycle, including collection, storage, computation, management, and utilization [2] - High-quality data sets are essential for the digital transformation of the manufacturing industry, serving as a new type of fuel for innovation and efficiency [1][2] Group 2: Market Awareness and Innovation - There is a need to cultivate a market awareness for paying for quality data, similar to purchasing advanced equipment and high-end technology [2] - Companies, especially leading firms with strong digital capabilities, should rely on innovation and data-driven strategies for growth, avoiding "involution" competition [2]
加快推动高质量数据集建设 助力构建开放共赢的数据生态
Zheng Quan Ri Bao Wang· 2025-09-16 12:18
Core Insights - The National Bureau of Statistics has initiated a pilot program for high-quality data set construction to support the "Artificial Intelligence +" action plan, identifying 140 pilot units across 25 provinces in China [1][2] - High-quality data sets are deemed a strategic resource in the AI era, crucial for enhancing the resilience of supply chains in key sectors such as finance, energy, and transportation [2][3] - The construction of high-quality data sets is expected to grow by 27.4% year-on-year in 2024, significantly supporting AI training and applications [2] Group 1: Pilot Program and Implementation - The pilot program focuses on four key tasks: testing technology, support, standards, and mechanisms, with ongoing monitoring and evaluation of project progress [1] - The program covers 18 key sectors, including scientific research, industrial manufacturing, agriculture, smart energy, transportation, financial services, healthcare, education, business, cultural tourism, emergency management, and meteorological services [1] Group 2: Market Impact and Opportunities - The pilot program aims to create a market environment that values high-quality data, facilitating the transformation of data from a resource to an asset [2] - Companies selected for the pilot program, such as Zhengtong Co. and Hengsheng Electronics, are exploring deep data resource integration, enhancing their digital leadership in the financial sector [2] Group 3: Regulatory and Financial Implications - High-quality data sets help reduce information asymmetry, enabling investors and financial institutions to better assess the technological leadership and commercialization potential of companies [3] - These data sets provide regulatory bodies with tools for real-time monitoring and compliance, enhancing regulatory efficiency while balancing data sharing and privacy protection [3] - The development of high-quality data sets is seen as a cornerstone for fair and transparent capital markets, driving the digital transformation of the financial industry [3]
广东电网:“数字匠芯”筑就AI高质量数据基石
Zhong Guo Dian Li Bao· 2025-09-16 07:36
Core Insights - Data is recognized as the "fifth new production factor" driving high-quality economic and social development, with the integration of artificial intelligence and data expected to bring transformative changes [1] - Guangdong Power Grid's high-quality dataset has been awarded as a typical case by the National Data Bureau, indicating successful implementation in the AI data engineering field [1] Group 1: Data Quality and AI Integration - Guangdong Power Grid has established a high-standard data processing and labeling system, providing reliable data for AI projects, which enhances equipment intelligence and industry empowerment [1][2] - The AI models rely on precisely labeled data to understand operational states, predict trends, and identify safety hazards, making high-quality data essential for smart grid development [2][3] Group 2: Data Utilization and Impact - The construction of a high-quality dataset has enabled the transformation of 1.5 million multimodal power safety data into reusable digital assets, facilitating a shift from single-scene recognition to multidimensional risk prediction [4] - The implementation of this dataset has led to a 32% reduction in similar violations by optimizing operational processes based on high-frequency violation data identified by algorithms [4] Group 3: Industry Implications and Best Practices - The recognition of Guangdong Power Grid's dataset as a national-level typical case showcases its technical strength and provides a replicable model for integrating AI with the real economy [5] - The case illustrates the importance of a data governance framework that aligns with industry needs, addressing the common challenge of AI implementation due to a lack of high-quality data [5]
2025年中国数据要素行业发展研究报告
艾瑞咨询· 2025-09-14 00:07
Core Insights - Data, as the fifth production factor, has unique characteristics such as non-competitiveness, replicability, and infinite growth potential, making its value extraction process more complex than traditional production factors [1] - The development of a market for data elements relies heavily on a clear policy framework and implementation pathways, with local data trading institutions and data merchants becoming key drivers [1][2] - The domestic data element market is expected to grow at a compound annual growth rate (CAGR) of approximately 20.26%, surpassing 300 billion yuan by 2028 [6] Current Situation Analysis - The data element market system is gradually improving, driven by policy guidance and industrial construction, focusing on data, technology, and infrastructure [2] - The digital economy's core industries are becoming significant drivers of the overall economic system, with the digital economy scale increasing from 27.2 trillion yuan in 2017 to 53.9 trillion yuan in 2023, doubling in six years [6] Policy Analysis - The improvement of the policy framework for the data industry value chain and the establishment of local data systems are crucial for the circulation of data element value [4] Market Scale Assessment - The data element industry is projected to reach approximately 200 billion yuan by 2025 and exceed 300 billion yuan by 2028, with data processing and analysis being the largest segment [6] Data Value Chain Construction - The establishment of a data value circulation system is supported by advanced technology and regulatory compliance [8] - The construction of a data ownership system based on the "Data Twenty Articles" is essential for efficient data value circulation [11] Data Registration - Data registration is critical for asset ownership delineation and promoting data value release, with a "1+3" policy framework guiding public data resource management [13] Data Value Assessment - The data valuation policy framework is becoming more refined, with public data resource quantification standards emerging as important benchmarks [16] Data Asset Capitalization - The capitalization of data assets is a core practice for realizing data value, with the implementation of regulations marking a new era for data asset inclusion in financial statements starting January 1, 2024 [19] Data Asset Trading - The data market exhibits a distribution pattern of "internal cold, external hot," with off-market transactions dominating due to their flexibility and customization [21] Industry Practices - The financial sector is expected to see a CAGR of approximately 19.06%, reaching over 100 billion yuan by 2028, driven by data element integration [31] - The industrial manufacturing sector is projected to grow at a CAGR of about 24.22%, with a focus on high-quality data sets and trusted data spaces [34] - The healthcare industry is anticipated to grow at a CAGR of around 23.69%, emphasizing the compliance of personal health data applications [36] Trends - High-quality data set construction is becoming a key factor in advancing the artificial intelligence industry, transitioning from "point breakthroughs" to "holistic development" [39] - The establishment of trusted data spaces will be crucial for ensuring the circulation and high-value application of data elements [42]