国内多数AI模型训练使用的中文数据占比已超60%
Xin Hua She·2025-08-21 07:13

Group 1 - The core point of the article highlights the significant role of Chinese data in enhancing the training performance of domestic AI large models, with over 60% of AI models utilizing Chinese data, and some models reaching 80% [1] - The National Bureau of Statistics emphasizes that the rapid development of artificial intelligence in China is closely linked to the country's focus on data work, with high-quality datasets being crucial for AI advancement [1] - As of June 2023, China has built over 35,000 high-quality datasets, totaling more than 400PB, which is approximately 140 times the digital resources of the National Library of China [1] Group 2 - The demand for data trading has surged due to AI model training, with a cumulative transaction value of nearly 4 billion yuan for high-quality datasets as of June 2023, and the total scale of high-quality datasets listed by data trading institutions reaching 246PB [2] - The National Bureau of Statistics plans to systematically promote the construction of high-quality datasets in key areas such as embodied intelligence, low-altitude economy, and biological manufacturing, aiming to enhance societal recognition of data value [2]