Workflow
高质量数据集
icon
Search documents
专题发布数据基础设施建设成果、入选高质量数据集典型案例
Nan Jing Ri Bao· 2025-09-01 02:18
Core Insights - The 2025 China International Big Data Industry Expo concluded in Guiyang, focusing on themes such as data-driven industrial momentum and new development chapters, featuring 375 participating companies and 26 exchange activities [1] Group 1: Data Infrastructure and Transactions - The National Data Bureau announced the successful cross-domain data transaction between Nanjing and Dalian, marking the first instance of credible data flow between cities in China [2] - This transaction utilized the national data circulation infrastructure, validating the feasibility and reliability of cross-domain data operations [2] Group 2: High-Quality Data Sets - The National Data Bureau released the first batch of 104 high-quality data set case studies, with two cases from Nanjing included, highlighting the importance of high-quality data sets for AI model training [4] - The two selected cases from Nanjing are the "Public Credit Archive High-Quality Data Set" by Nanjing Lais Information Technology Co., Ltd. and the "Intelligent Inspection and Safety Control Data Set for China Huadian Power Generation" by Nanjing Nanzi Information Technology Co., Ltd. [4] Group 3: Public Credit Archive Data Set - The "Public Credit Archive High-Quality Data Set" has collected over 800 billion data entries, covering more than 180 million social entities and 800 million individuals, with an annual increase of over 2 billion entries [5] - This data set is utilized in various sectors, including government services and social governance, enhancing administrative efficiency and reducing market operation costs [5] Group 4: Intelligent Inspection Data Set - The "Intelligent Inspection and Safety Control Data Set" addresses challenges in the power generation sector by creating a comprehensive dataset covering various energy types, including wind, solar, hydro, and thermal power [6] - It establishes a standard system for data collection, annotation, and application, promoting industry development [6] Group 5: High-Quality Data Set Industry Base - The "China High-Quality Data Set Industry Base (Nanjing)" was inaugurated, focusing on key technology breakthroughs and standard system construction for high-quality data sets [7] - The base aims to foster collaboration among industry players and enhance the development of AI data sets, contributing to the innovation and intelligence of various sectors [7]
我省4项目入选国家高质量数据集典型案例
Xin Hua Ri Bao· 2025-08-30 23:21
Group 1 - The National Data Bureau officially released the first batch of 104 high-quality data set typical case lists at the China International Big Data Industry Expo held from August 28 to 30 [1] - Four cases from Jiangsu Province were selected, including China Mobile's "China Mobile R&D Large Model High-Quality Data Set," Nanjing Lais Information Technology's "Public Credit Archive High-Quality Data Set," Nanjing Nanzi Information Technology's "China Huadian Power Generation Intelligent Inspection and Safety Control High-Quality Data Set," and China Energy Conservation Solar Technology's "Integrated Energy High-Quality Data Set Construction" [1] - The "China Mobile R&D Large Model High-Quality Data Set" has a total data volume exceeding 10TB, covering 8 categories and 17 data sets, which can be reused in various vertical industries such as industrial, financial, and transportation for improving and evaluating data quality of large models [1] Group 2 - The "Public Credit Archive High-Quality Data Set" has connected with 47 ministries, 31 provincial units, and the Xinjiang Production and Construction Corps, accumulating over 80 billion records by June this year, widely applied in government services and social governance [1] - The "China Huadian Power Generation Intelligent Inspection and Safety Control High-Quality Data Set" has constructed a visual data set covering all types of power generation including wind, solar, hydro, and thermal [1] - China Energy Conservation Solar Technology (Zhenjiang) Co., Ltd. provides integrated green and low-carbon operational scenarios and delivery service capabilities through the construction of the integrated energy high-quality data set [1]
实探数博会:数据赋能千行百业
Group 1 - The core viewpoint of the news is that China's digital economy is expected to reach an added value of approximately 49 trillion yuan by the end of this year, accounting for about 35% of GDP, with the core industries of the digital economy achieving their "14th Five-Year Plan" targets ahead of schedule [1] - The 2025 China International Big Data Industry Expo, themed "Data Aggregates Industrial Momentum, Intelligent Development New Chapter," features over 16,000 registered guests and 375 participating companies, highlighting the importance of data as a driving force for industrial development [1] - The event showcases various applications of digital technology, including the low-altitude economy, with companies like Zhongke Xingtong presenting comprehensive product systems and case studies in the field of aerospace information [1][2] Group 2 - The GEOVIS iFlight low-altitude intelligent flight application platform is a highlight product at the expo, providing standardized services for aerial photography, intelligent inspection, and logistics delivery, addressing various low-altitude business needs [2] - High-quality data sets are crucial for supporting the "Artificial Intelligence +" initiative, with over 400PB of high-quality data sets constructed in China by the end of June [3] - The total scale of intelligent computing in China has reached 780,000 PFlops, ranking second globally, with significant contributions from western regions where data center operational costs are 50% to 70% lower than in eastern regions [3] Group 3 - Discussions at the expo emphasize the need for high-quality data set construction and the promotion of data element circulation and value release [3][4] - Experts suggest that the market for high-quality data sets is still small, and automation and intelligence in production need improvement, proposing subsidies to encourage the supply and demand of high-quality data sets [4] - The National Data Bureau is deploying pilot projects for data industry clusters to foster a robust industrial ecosystem and scale advantages, building on previous guidelines for high-quality data industry development [4]
2025数博会:高质量数据集的建设非常重要
Zhong Guo Xin Wen Wang· 2025-08-28 14:04
Core Insights - The construction of high-quality datasets is crucial for the advancement of artificial intelligence, as emphasized by the head of the National Data Bureau, Liu Liehong [1] Group 1: High-Quality Datasets - Over 35,000 high-quality datasets have been established across China, with a total volume exceeding 300PB, providing a solid data foundation for the rapid enhancement of AI model performance [2] - The cumulative transaction value of high-quality datasets in China has reached 4 billion yuan, with 3,364 trading institutions listing these datasets, totaling 246PB in scale [2] - There is a growing demand for data trading driven by AI model training, as various sectors are increasingly willing to invest in data resources to support the application of large models [2] Group 2: Regional Development and Initiatives - Guizhou Province has established a provincial-level data circulation trading platform and has cultivated over 200 data businesses, releasing more than 900 high-quality datasets in key sectors such as finance, manufacturing, healthcare, and commerce [3] - The province is actively promoting the synergy of computing power, data, and application industries, focusing on four key sectors: intelligent computing, high-quality data construction, AI large models, and digital information [3] Group 3: Strategic Initiatives - The launch of the high-quality dataset navigation plan marks a significant step in building the data element ecosystem and enhancing the supply of high-quality datasets [5] - The importance of high-quality datasets is underscored by experts, stating that without them, even the most advanced algorithms and powerful computing resources cannot achieve breakthroughs in AI [6]
主流文化语料库重磅上线,将为数字文化产业发展带来哪些意义?
Qi Lu Wan Bao Wang· 2025-08-25 08:39
Core Insights - The rapid development of generative artificial intelligence has made high-quality datasets a core competitive advantage for AI technology breakthroughs [1][2] - The establishment of a mainstream cultural corpus is essential for the development of the digital cultural industry, supported by both policy guidance and the need for competitive core capabilities [2][3] Necessity - The construction of a corpus is becoming an industry imperative as it serves as a core resource for training AI models [2][3] - High-quality datasets are defined as collections of data resources that cover core professional knowledge and operational activities, essential for training and optimizing AI models [1][2] Implementation - The Shandong Digital Culture Group is collaborating with People’s Daily to build the first mainstream cultural corpus in the country, which will include authoritative media resources and high-quality cultural resources from local institutions [3][4] - The corpus aims to provide a "value-compliant" data resource for AI applications, ensuring alignment with national values and social resonance [3][4] Data Processing - The Shandong cultural data annotation platform offers a one-stop service for data collection, cleaning, pre-annotation, annotation, enhancement, and review, supporting various data types [7][11] - The platform employs a standardized process to ensure data quality and uniqueness, enhancing the efficiency of data processing [11][12] Future Plans - The first phase of the mainstream cultural corpus focuses on Shandong's excellent culture, with plans to create a wide-ranging and rich dataset to enhance the performance of cultural AI models [4][9] - The Shandong Digital Culture Group plans to launch a cultural data trading platform to facilitate the circulation and monetization of data assets [15]
世界500强CIO齐聚第八届南方信息大会丨汉数创始人陈开冉受邀发表演讲
Jiang Nan Shi Bao· 2025-08-19 09:11
Core Insights - The eighth Southern Information Conference, hosted by the Guangdong Chief Information Officer Association, gathered top scholars and CIOs from leading companies to discuss the challenges and opportunities for CIOs in the AI era [1][2]. Group 1: Importance of High-Quality Data - High-quality datasets are likened to "high-octane gasoline," crucial for the performance and application of AI models [3]. - The shift from a "model-centric" to a "data-centric" approach emphasizes that high-quality labeled data is key to unlocking AI's value, directly impacting the effectiveness of large models [4]. Group 2: Addressing Challenges in AI Implementation - High-quality datasets can mitigate the "hallucination" problem, which refers to the generation of incorrect or unsubstantiated information by AI models, especially in the absence of specialized data [5]. - By incorporating industry-specific knowledge and data, large models can transition from being generalists to specialists, enhancing their ability to provide valuable insights in specialized fields [6]. Group 3: The "Konghu" Data Cloud Platform - The "Konghu" high-quality data platform is designed to meet the high data demands of large models and address challenges in data integration during digital transformation [7]. - The platform integrates over 3.8 billion enterprises, 250,000 buildings, and 3 billion products across 18 vertical sectors, providing a comprehensive data ecosystem for AI applications [8]. - A "three-stage" data integration model simplifies the data acquisition process, making it more accessible for agile AI development [10]. Group 4: Bridging the Gap for AI Models - The "MCP service market" aims to connect internal and external data and tools, facilitating real-time access to high-quality data for large models [11]. - The "Konghu" data cloud has established partnerships with major model providers like ByteDance, Alibaba, and Baidu, enhancing the accessibility of data for enterprises [12]. Group 5: Future Directions - The company aims to leverage high-quality datasets to help industry-specific large models overcome challenges and provide satisfactory answers in professional applications [14]. - The focus is on expanding the breadth and depth of data coverage and building an open and win-win data ecosystem to drive productivity in various industries [14].
今年底数据流通节点城市将扩大到50个左右
Core Insights - During the "14th Five-Year Plan" period, China's digital infrastructure is leading globally in terms of scale and technology, with significant advancements in data circulation and digital economy development [1][2][3] Digital Infrastructure Development - By the end of 2025, the number of 5G base stations is expected to increase fivefold compared to 2020, reaching 4.55 million, while gigabit broadband users will grow 34 times to 226 million [2] - As of June 2023, over 25 cities, including major ones like Beijing, Shanghai, Guangzhou, Shenzhen, and Hangzhou, have established data infrastructure nodes, with plans to expand to around 50 cities by the end of this year [3] Data Economy and Market Reforms - The total transaction volume of high-quality data sets reached nearly 4 billion yuan as of June 2023, with a total scale of high-quality data sets at 246 PB [4] - The National Data Bureau has introduced over 21 policies for public data resource development and plans to launch more than 10 additional systems, including data property rights [1][2] Employment and Economic Impact - The digital economy has created over 100 new types of jobs, significantly contributing to employment opportunities [2] - Software revenue is projected to grow by 80% by the end of 2024 compared to 2020, while the added value of the electronic information manufacturing industry is expected to increase by over 70% [2] Future Directions - The National Data Bureau aims to focus on high-quality standard construction, large-scale facility deployment, and market-oriented ecological operations to support digital economic development and technological innovation [3][4] - New models such as "data corpus equity investment" are being piloted to encourage enterprises to convert high-quality data sets into equity [4]
国家数据局这场发布会,信息量很大!
Ren Min Wang· 2025-08-14 13:12
国务院新闻办公室8月14日举行"高质量完成'十四五'规划"系列主题新闻发布会,介绍"十四五"时期数字 中国建设发展成就。 【截至6月底,日均Token消耗量已突破30万亿】 国家发展改革委党组成员、国家数据局局长刘烈宏在会上表示,作为人工智能发展的三大核心要素之 一,数据在推动"人工智能+"过程中发挥着关键作用,特别是高质量数据集的建设至关重要。 刘烈宏介绍,在人工智能时代,Token(词元)作为处理文本的最小数据单元,如同互联网时代大家所 说的"流量"。2024年初,我国日均Token的消耗量为1千亿,截至今年6月底,日均Token消耗量已经突破 30万亿,1年半时间增长了300多倍,这反映了我国人工智能应用规模的快速增长。 刘烈宏强调,我国人工智能的快速发展,与我国高度重视数据工作是密不可分的。我国是第一个把数据 作为生产要素的国家,多措并举促进数据资源的开发利用。"人工智能+"行动到哪里,高质量数据集的 建设和推广就要到哪里。我国大力推动高质量数据的供给,出台了高质量数据集建设相关文件,多部门 联合推动相关工作。 刘烈宏介绍,截至今年6月底,我国已经建设高质量数据集超过3.5万个,总体量超过了400PB ...
中国电子云成立AI产品线 欲破解AI应用四大落地难点
Core Insights - The rapid iteration of technology, continuous enhancement of computing power, and decreasing costs are driving the commercial value of artificial intelligence (AI) across various industries [1] - Despite advancements, challenges such as the unsuitability of general models in vertical fields, high training and inference costs, non-standardized application scenarios, and stringent data security requirements are testing the maturity of the AI industry [1][2] - The year 2025 is anticipated to be a pivotal year for AI applications, but the complexity of implementing AI in specific industries is greater than expected [1][2] Challenges in AI Implementation - The first major challenge is the specific requirements of many industries, particularly those with high confidentiality and professionalism, making it difficult for general models to meet usage standards [2] - Other challenges include the cost-effectiveness of computing power, the extreme pursuit of performance and stability in industry scenarios, and the difficulty in standardizing application deployment [2] - High costs associated with GPU cards, which can require investments of hundreds of thousands to millions, limit the scalability of AI applications [2] Customization and Service Needs - AI implementation requires deep customization and supporting services, making pure product sales models ineffective in the B2B market [3] - Different industries have vastly different customer needs and business processes, necessitating tailored development and services [3] Full-Chain AI Solutions - In response to these challenges, China Electronics Cloud has established an AI product line to address the pain points of AI implementation in critical industries [4] - The "China Electronics Cloud·New Star" full-chain AI solution aims to create a complete AI implementation loop from data, models, applications, to services [4] Data Quality and Model Development - Building high-quality datasets for training industry-specific models is a key strategy to overcome the limitations of general models [5] - The development of models involves over 80% of the workload in data preparation, with a pressing need for high-quality datasets in critical industries [5] Security and Cost-Effectiveness - The full-chain AI solution emphasizes security, addressing the high confidentiality and auditing requirements of critical industries [6] - The strategy of "software and hardware collaboration" aims to enhance cost-effectiveness by optimizing software algorithms with hardware architecture [6] Service-Driven AI Applications - China Electronics Cloud provides customized solutions through a multi-modal data governance platform, model development platform, and application development platform [7] - This integrated platform approach creates a feedback loop where applications drive data, data refines models, and models enhance applications, promoting a replicable and implementable AI deployment paradigm [7]
上海布局、各方协同,这场论坛力促大模型“落地生花”
Guo Ji Jin Rong Bao· 2025-07-27 15:33
Group 1 - The "China Electronic Cloud Artificial Intelligence Innovation Development Forum" was held in Shanghai, focusing on the development of the digital intelligence industry and AI innovation applications [1] - Shanghai is seizing strategic opportunities to deepen the layout of the entire AI industry chain, aiming to exceed 100,000 PetaFLOPS in intelligent computing capacity by the end of 2025 [1] - The development route includes "4 basic models + N vertical models," with a differentiated development layout of "one east, one west, one soft, one hard" [1] Group 2 - China Electronics is seizing AI development opportunities by establishing a complete integrated circuit industry chain and developing the CECSTACK cloud platform for AI applications [2] - The focus is on creating low-cost personal inference machines and improving the usability of domestic intelligent computing systems [2] - The approach includes building a domestic CUDA-like system and enhancing key software to fully utilize domestic hardware capabilities [2] Group 3 - High-quality data sets are crucial for training and optimizing large models, characterized by high technical content, high knowledge density, and high-value applications [3] - Challenges in building high-quality data sets include unclear objectives, fragmented implementation paths, and weak technical foundations [3] - The launch of the "China Electronic Cloud·New Star" aims to provide a comprehensive AI solution with a "3+3+N" product service system [3] Group 4 - Strategic cooperation agreements were signed between China Electronic Cloud and several organizations, including China Great Wall and Mu Xi Co., to enhance collaboration in AI development [4] - A joint initiative was launched to accelerate the high-quality development of China's autonomous AI industry, focusing on technology assessment, optimizing domestic computing ecology, and promoting high-quality data set construction [6]