高质量数据集

Search documents
2025数博会:26场交流活动超300位精英 擘画全球数字经济“新图景”
Zhong Guo Xin Wen Wang· 2025-08-26 06:37
由清华大学等高校及企业承办的交流活动将充分展现大数据行业创新活力,主题涵盖数据开发利用、可 信数据空间、数据交易、人工智能、数字人才培养等多个方向,广泛邀请各类专家学者及企业精英从不 同视角发声,深入交流发展中的经验成果、困惑与挑战,力求通过思想碰撞与经验交流,为各方提供有 益的思路与借鉴。 2025数博会交流活动还预计发布64项具有引领意义的关键行业标准和具有参考价值的典型案例,为企业 的战略布局、技术创新、业务拓展等提供有价值的路径参考,助力企业在激烈的市场竞争中把握机遇、 赢得先机,推动整个大数据产业生态的健康繁荣发展。 官方表示,2025数博会交流互动活动将汇聚行业智慧、融合政产学研用、聚焦数据未来的交流盛宴,为 数字经济发展注入强劲新动能,书写合作共赢的新篇章。 海内外各方还将参与"高质量数据集""数字政府""算力聚势智能未来""AI可信数据空间大会""公共数据 开发利用"等交流活动,分享数字政府建设、可信数据空间应用、数据安全治理等领域的典型经验与实 践案例,为大数据行业发展贡献智慧。 贵阳贵安也积极谋划特色鲜明的交流活动,充分发挥算力枢纽节点城市优势,邀请国家算力枢纽节点城 市代表及算力领域知名 ...
截至6月底中国日均Token消耗量突破30万亿
Zhong Guo Xin Wen Wang· 2025-08-15 06:12
Group 1 - The core point of the news is that the consumption of Tokens in China has surged dramatically, reflecting the rapid growth of artificial intelligence applications in the country [1][3] - As of June 2023, China's daily Token consumption exceeded 30 trillion, marking a growth of over 300 times from 1 billion in early 2022 [3][4] - The Chinese government emphasizes the importance of data as a production factor and has implemented various measures to promote the development and utilization of data resources [3][4] Group 2 - China has constructed over 35,000 high-quality datasets, with a total volume exceeding 400 petabytes (PB), which is approximately 140 times the digital resources of the National Library of China [4] - The demand for data trading has increased due to the training of artificial intelligence models, with a cumulative transaction value of nearly 4 billion RMB for high-quality datasets as of June 2023 [4] - The proportion of high-quality datasets in total transactions at data trading institutions has risen from 10% last year to nearly 80% currently [4]
国家数据局:数据产业已成为数字经济发展新增长点
Zheng Quan Ri Bao· 2025-08-14 16:12
Group 1 - The data industry in China is experiencing rapid growth, with the number of data enterprises expected to exceed 400,000 and the industry scale reaching 5.86 trillion yuan by 2024, representing a 117% increase from the end of the 13th Five-Year Plan [1] - The data industry encompasses various sectors including data collection, storage, circulation, development, and infrastructure, contributing to the marketization and valuation of data elements [1] - Data technology is evolving from business intelligence (BI) to artificial intelligence (AI), leading to a new industrial ecosystem characterized by deep data mining, algorithmic computing power, and high data integration [1] Group 2 - The National Data Bureau plans to deploy data industry aggregation zone pilot projects to optimize industrial layout and accelerate the formation of industrial ecosystems and scale advantages [2] - High-quality data sets are crucial for the development of artificial intelligence, and the construction of data infrastructure is essential for promoting data circulation [2] - In the first half of this year, 3,328 new data products were launched by major data trading institutions, marking a 70% year-on-year increase, with high-quality data sets in the AI sector growing by 2.8 times [2] Group 3 - As of June 30, over 35,000 high-quality data sets have been established in China, totaling over 400 PB, which is approximately 140 times the digital resources of the National Library of China [3] - The demand for data trading has surged due to the training of artificial intelligence models, with a cumulative transaction value of nearly 4 billion yuan for high-quality data sets by June 30 [3] - The National Data Bureau aims to enhance the construction of high-quality data sets and promote the recognition of data element value across society [3]
国家数据局:截至2025年6月底 我国已建设高质量数据集总体量超过了400PB
Xin Hua Cai Jing· 2025-08-14 07:34
Group 1 - As of June 2025, China has built over 35,000 high-quality datasets, totaling more than 400PB, which is approximately 140 times the digital resources of the National Library of China [1] - The cumulative transaction value of high-quality datasets across the country reached nearly 4 billion yuan, with the total scale of high-quality datasets listed by data trading institutions reaching 246PB [1] - In Beijing, the proportion of high-quality datasets in total transactions surged from 10% last year to nearly 80% currently [1] Group 2 - China's rapid development in artificial intelligence is closely linked to its emphasis on data work, being the first country to treat data as a production factor [2] - The majority of model training in China now uses Chinese data, with over 60% of the data used for training domestic models being Chinese, and some models reaching 80% [2] - The development and supply capacity of high-quality Chinese data continue to enhance, driving rapid improvements in the performance of AI models in China [2]
上海、天津、安徽等地正在试点“数据语料作价入股”等新模式
Nan Fang Du Shi Bao· 2025-08-14 05:57
Group 1 - The core viewpoint emphasizes the importance of high-quality data sets in the development of artificial intelligence, with a focus on their construction and promotion across various sectors [2] - As of June 2023, over 35,000 high-quality data sets have been established in China, totaling more than 400PB, which is approximately 140 times the digital resources of the National Library of China [2] - The demand for data trading has surged, with a cumulative transaction value of nearly 4 billion yuan for high-quality data sets by June 2023, and the total scale of listed high-quality data sets reaching 246PB [2] Group 2 - The development of high-quality data sets relies on the support of the data annotation industry, with the National Data Bureau establishing seven data annotation bases in cities like Chengdu, Shenyang, and Hefei [3] - The proportion of Chinese data used in model training has exceeded 60% in most domestic models, with some models reaching 80% [3] - Future plans include systematic efforts to advance high-quality data set construction, focusing on key areas such as embodied intelligence, low-altitude economy, and biomanufacturing [3]
独家对话中国联通赵亚晖,AI时代的“数据燃料”是如何炼成的?
Feng Huang Wang· 2025-08-04 12:47
Core Insights - High-quality data is becoming a differentiated advantage for China's AI industry [1][5] - China Unicom has showcased its data industry foundation at the WAIC, emphasizing its unique path in data governance, security, and industry empowerment [1][2] Group 1: Data Infrastructure and Capabilities - China Unicom's data foundation integrates computing power, algorithms, and high-quality data sets, focusing on the construction of high-quality data sets [2][3] - The company has accumulated 700PB of enterprise data and created over 400TB of high-quality communication and industry data sets through collaboration with industry partners [2][3] - A framework called "Three Ones" has been established, which includes a governance methodology, a platform tool, and a high-quality data set [3][4] Group 2: Data Governance and Tools - A data governance methodology has been developed to manage data sets through a classification framework and a dynamic optimization mechanism [3] - A comprehensive toolchain has been created for the entire data processing lifecycle, from collection to quality evaluation, which has won the DataOps Product Innovation Award [3] - Specialized data sets have been built for eight fields, supporting training and fine-tuning of 27 large model scenarios, with some data sets recognized as exemplary by the State-owned Assets Supervision and Administration Commission [3] Group 3: AI Transformation and Industry Applications - China Unicom's strategy focuses on specific industry scenarios, enhancing smart operations and digital transformation across various sectors [6] - The company has developed over a thousand intelligent agents and engaged more than ten thousand participants in AI transformation initiatives [6] - Real-time, accurate, and reliable communication data is highlighted as a unique advantage for empowering various industries [6] Group 4: Data Security and Privacy - China Unicom recognizes the dual nature of data circulation, balancing value release with security and privacy challenges [7] - The company has established multiple security measures, including a data classification platform and a multi-layered technical protection system [7] - Active participation in national trustworthy data space construction aims to ensure efficient, secure, and traceable data circulation [7]
从传统市场到大模型驱动,AI时代的数据交易革命
Di Yi Cai Jing· 2025-07-15 12:00
Core Insights - The rise of large AI models is fundamentally changing the data trading market dynamics [1][2] - The demand for high-quality datasets has surged exponentially, transforming them from "useful resources" to "strategic assets" [2][3] Data Market Evolution - In 2024, AI data accounted for only 10% of the trading volume at data exchanges, but this is projected to reach nearly 80% by 2025 [1] - Prior to the emergence of large models, the data trading market was relatively immature, characterized by low market concentration, low standardization, and low transparency [1] High-Quality Data as a Strategic Asset - High-quality datasets are now considered the "new oil" in the era of large models, essential for training, validating, and optimizing these models [2][3] - The value of these datasets is determined by their professionalism, diversity, and cleanliness [2] Industry Knowledge as a Competitive Edge - The shift from general models to vertical applications has increased the value of specialized datasets in sectors like finance, healthcare, and law [3] - The demand for multi-modal data is rising, as seen in fields like autonomous driving, where diverse data types are required [3] Structural Changes in Data Acquisition - The proportion of publicly available internet data is declining, with private data production gaining importance [3] - Leading companies are establishing comprehensive data production lines, from collection to governance, making professional data a primary acquisition channel [3] Future Data Trading Characteristics - A mixed architecture of decentralized and centralized data trading platforms is likely to emerge, addressing efficiency and quality concerns [4] - New pricing and incentive mechanisms, such as token-based models, will activate data supply and encourage sharing of long-term value [4] - Industry-specific data alliances are forming to overcome data barriers, reducing costs and preventing sensitive information leakage [4] Long-term Market Outlook - The data trading ecosystem is expected to undergo profound restructuring driven by AI over the next five to ten years [4] - China is positioned to play a significant role in the global data factor market due to its large market size, diverse application scenarios, and proactive policy guidance [5]
湖北推进可信数据空间发展 到2028年开发至少300个应用场景
Chang Jiang Shang Bao· 2025-06-17 23:43
Group 1 - Hubei Province has launched an action plan to promote the development of trusted data spaces, aiming for integrated development in construction, usage, management, and services by 2028 [1][4] - The plan targets the establishment of at least 30 effective trusted data spaces, the development of no less than 300 application scenarios, and the listing of over 2000 data products by 2028 [1][4] - Wuhan aims to become a data element hub city, with significant achievements in the data element sector, including being selected as a national pilot city for data circulation and utilization [2][3] Group 2 - The data annotation industry in Wuhan is rapidly developing, with over 60 key enterprises and a goal to exceed 50 billion in business volume related to data annotation by 2027 [3] - The action plan for the data annotation industry aims to introduce and cultivate two leading data annotation enterprises and generate over 100 billion in growth for related AI industries [3] - The national data bureau has set a target to establish over 100 trusted data spaces by 2028, with Hubei contributing to this goal through its action plan [4][5] Group 3 - The action plan outlines five key directions for data space construction: urban, enterprise, industry, cross-border, and personal trusted data spaces [5] - Support is provided for cities like Wuhan, Xiangyang, and Yichang to conduct pilot projects that enhance data management and interconnectivity [5] - The plan encourages state-owned enterprises and platform companies to develop enterprise trusted data spaces to accelerate digital transformation [5]
南财数据周报(51期):10个国家数据要素综合试验区启动建设;高质量数据集技术文件将加快研制
2 1 Shi Ji Jing Ji Bao Dao· 2025-06-06 10:27
Group 1 - The release of the "Regulations on Government Data Sharing" marks a new phase of legal governance for data sharing in China, providing a legal framework for efficient data circulation and enhancing government digital governance capabilities [2][3] - The regulations address existing issues such as incomplete mechanisms and unclear responsibilities in government data sharing, aiming to eliminate "data silos" and improve the efficiency of data utilization [2] - The establishment of 10 national data element comprehensive pilot zones in various provinces aims to support the integration of the real economy and digital economy, fostering a robust data market ecosystem [3] Group 2 - A seminar on high-quality data set construction and standardization was held, focusing on guidelines, format requirements, and quality assessment for data sets, which will facilitate the application of artificial intelligence in central enterprises [4][5] - Guangzhou's "Digital Guangzhou Construction 2025 Work Points" outlines 32 key tasks for digital transformation, emphasizing the development of data resources and the establishment of a governance system for data circulation [5]
数据价值转化加速,高质量数据集赋能AI
China Post Securities· 2025-05-07 04:55
Industry Investment Rating - The industry investment rating is "Outperform the Market" and is maintained [1] Core Viewpoints - The report highlights the acceleration of data value transformation, with high-quality datasets empowering AI development. The Digital China Summit emphasized the importance of high-quality datasets in driving AI advancements [4][5] - The establishment of various high-quality data collection projects across multiple cities indicates a growing focus on data-driven applications in sectors such as healthcare, transportation, education, and finance [5][6] - The report notes the launch of the first data asset securitization project in China, which marks a significant step in the marketization of data elements and provides a new financing channel for the digital economy [7][8] Summary by Relevant Sections Industry Basic Situation - The closing index is at 4675.66, with a 52-week high of 5440.49 and a low of 2805.53 [1] Relative Index Performance - The report provides a forecast of relative performance for the computer industry against the CSI 300 index, showing a potential increase of 24% to 46% from May 2024 to December 2025 [3] Key Companies and Investment Ratings - Notable companies in the industry include: - Sanwei Tiandi (Buy) with a closing price of 28.88 and a market cap of 2.23 billion - Shanghai Steel Union (Buy) with a closing price of 20.76 and a market cap of 6.62 billion - YunSai ZhiLian (Buy) with a closing price of 23.48 and a market cap of 32.11 billion - Shanda Diwei (Buy) with a closing price of 9.55 and a market cap of 3.82 billion - New Point Software (Buy) with a closing price of 33.00 and a market cap of 10.89 billion [10][12]