Workflow
高质量数据集
icon
Search documents
怎样的数据才算“高质量”?南京玄武:全国首笔具身智能数据集交易的背后
Yang Zi Wan Bao Wang· 2026-01-03 13:51
弯肘、抬臂、向前抓取……在江苏箸境智能科技有限公司(以下简称"箸境智能")的采集室内,工作人员屏幕前的每一个基础动作,都同步映射为机器人的 精准动作,进而转化为一条条包含视频、关节角度与力矩等信息的结构化数据。 前不久,这些数据汇聚而成的"具身智能数据集"在江苏省数据交易所完成交易,实现全国该领域数交所交易的"零突破"。上架即售出的现象背后,折射出 人工智能产业正在从"模型驱动"向"数据驱动"深度转型。在算法逐渐开源、算力持续扩容的当下,高质量数据集已成为决定AI技术落地效能的关键稀缺资 源,并迅速跃升为产业竞争中的"香饽饽"。当AI向真实世界深处演进,怎样的数据才算"高质量"?谁会为其买单? 一个动作重复数万次:机器如何成"人" 落成4个多月,位于南京市玄武区的箸境智能一直很忙。当记者走进企业采集室,一台机器人正通过水瓶、抹布等物品,练习着抓、取、放等家政行业通 用技能。 记者从操作员手里接过"控制"机器人的穿戴设备,尝试着让它拿起毛巾,再叠放整齐……体验过后才发现,"叠起来真的很费劲",毛巾不时从机器人的指 缝中掉落,"灵巧手"并没有那么灵巧。遥操员王煊解释,这与教幼儿园小朋友学写字,要靠老师手把手教类似 ...
AI演进新阶段:智能体崛起呼唤高质量数据供给
Zhong Guo Xin Wen Wang· 2025-12-07 02:37
Group 1 - The 2025 Digital Intelligence Technology Ecological Conference in Guangzhou focuses on the deep integration of artificial intelligence and digital technology [1] - The National Data Bureau emphasizes the need for open collaboration to build a unified national data element market, fostering a more open industrial ecosystem [1] - Guangdong Province is leading in the market-oriented allocation of data elements, viewing AI and data as core new productive forces driving high-quality economic growth [1] Group 2 - China Telecom introduced the Starry Sky Intelligent Service Platform 1.0, featuring the "Xing Xiaochen" intelligent agent for cross-terminal and cross-scenario intelligent services [2] - The intelligent agent supports users in completing complex tasks through natural language, transforming user interaction [2] - High-quality datasets are identified as the foundation for enhancing AI capabilities, with over 500 PB of high-quality datasets constructed nationwide as of September [2] - The National Data Development Research Institute proposes a new approach to advance high-quality dataset construction, focusing on compliance review, quality assessment, and industry mapping [2]
前瞻全球产业早报:我国信息通信领域首个国家重大科技基础设施正式投入运行
Qian Zhan Wang· 2025-12-05 14:52
Group 1 - The upcoming Central Economic Work Conference is expected to focus on sustaining consumer spending and supporting the private sector, with a continuation of loose fiscal and monetary policies likely to stabilize economic recovery momentum [2] - China's first major national technology infrastructure in the information and communication sector, the Future Network Experimental Facility, has officially commenced operations, marking a significant advancement in network technology innovation and application capabilities [2] - As of the end of Q3, China has built over 500PB of high-quality datasets, which are crucial for enhancing AI model performance and accelerating innovation across various industries [3] Group 2 - Seven satellite internet innovation entities in Shanghai have been awarded recognition, including institutions like the Chinese Academy of Sciences and Shanghai Aerospace Space Technology Co., Ltd [4] - The Hubei Provincial State-owned Assets Supervision and Administration Commission plans to optimize the layout of state-owned enterprises and enhance the efficiency of state capital operations by revitalizing idle assets and promoting specialized integration among different levels of enterprises [5] - ByteDance's mobile phone, the Doubao, has sold out its initial batch of 30,000 units, with the second-generation product expected to be released by the end of 2026 [5] Group 3 - Samsung has established an AI research institute, appointing Lee Kang-soo as its first head, indicating a strategic focus on AI development [11] - OpenAI has reached an agreement to acquire Neptune, a startup that provides monitoring and debugging tools for AI model training, enhancing its capabilities in AI research [8] - South Korea's exports are projected to exceed $700 billion for the first time, driven by strong performance in key sectors such as semiconductors, automobiles, and shipbuilding [7]
全国已建设高质量数据集总体量超500PB
Xin Hua She· 2025-12-04 14:24
国家数据局局长刘烈宏表示,高质量数据集是数智创新的关键资源,国家数据局联合多部门共同制 定政策文件,以场景应用为导向,推动各行业领域高质量数据集建设;部署了140项先行先试任务,并 初步实现了"人工智能+"到哪里,高质量数据集建设和应用就到哪里的良好氛围。 国家数据局数据显示,截至9月底,我国7个数据标注基地引进和培育标注企业362家,标注从业人 员达8.5万人,带动数据标注相关产值163亿元;我国日均Token消耗量已突破40万亿,和2024年初相 比,增长了约400倍。(记者 高亢) 记者12月4日从国家数据局获悉,截至9月底,全国已建设高质量数据集总体量超500PB,为助力人 工智能融入千行百业添砖加瓦。 ...
刘烈宏出席“2025科创大会”并致辞
Core Points - The integration of data elements with artificial intelligence is essential for promoting intelligent innovation [1] - The National Data Bureau emphasizes the importance of data infrastructure, high-quality data sets, and talent development in driving data-driven innovation [1] Group 1: Data Infrastructure - Data infrastructure is a crucial carrier for intelligent innovation, addressing the challenges of "security, compliance, and efficiency" in data circulation [1] - The positive externalities of data elements need to be maximized while mitigating negative externalities [1] Group 2: High-Quality Data Sets - High-quality data sets are identified as key resources for intelligent innovation [1] - The National Data Bureau is focusing on market-oriented reforms for data elements, including policy formulation, supply promotion, standard establishment, technology enhancement, and ecosystem cultivation [1] Group 3: Talent Development - The construction of a talent team is a critical support for intelligent innovation [1] - The National Data Bureau has issued opinions to strengthen the development of data-related disciplines and digital talent training models [1] - A "dual-driven" approach will be implemented to accelerate the establishment of a new ecosystem for independent digital talent cultivation [1] Group 4: Investment and Market Awareness - A call for increased investment in the data sector is made to foster a market awareness of "paying for high-quality data" [1] - The aim is to inject new momentum into the market-oriented reform of data element allocation [1]
2025全球数商大会全链路数据治理赋能高质量数据集建设分论坛举行
Di Yi Cai Jing· 2025-11-27 07:25
Core Insights - The forum focused on "full-link data governance empowering high-quality data set construction" and was co-hosted by Puyuan Information Technology Co., Ltd. and the East China Branch of the China Academy of Information and Communications Technology [1] - The event highlighted the launch of Puyuan's "Yishu" data asset governance platform and the establishment of the "AI Core Data Set Ecological Alliance" [1] Government and Industry Recognition - Leaders from Shanghai's data bureau and industrial internet association acknowledged the importance of high-quality data sets as a core element of new productive forces and praised Puyuan's efforts in promoting data value [3] Academic and Research Insights - The China Academy of Information and Communications Technology provided a systematic framework for constructing high-quality data sets through the "Artificial Intelligence High-Quality Data Set Construction Guide" [6] - A professor from Shanghai Jiao Tong University discussed the use of intelligent governance technology to build high-quality data models [8] Puyuan's Solutions - Puyuan's leader emphasized that constructing high-quality data sets is a systematic project, requiring a comprehensive approach to data governance that transforms chaotic data into foundational knowledge assets [10][12] - The latest version of Puyuan's "Yishu" AI-native data asset platform was launched, designed to enhance data construction efficiency and provide agile data insights [12] Collaborative Initiatives - Puyuan initiated the "AI Core Data Set Ecological Alliance" and launched the "Lighthouse Project" to promote industry collaboration [14] Industry Practices - Experts from various sectors, including energy and aerospace, shared successful case studies on high-quality data set construction, demonstrating the practical value of Puyuan's advocated concepts [17]
共创自然资源数据应用新生态 自然资源行业高质量数据集建设与创新应用论坛成功举办
Sou Hu Wang· 2025-11-12 07:39
Core Insights - The forum on "High-Quality Data Set Construction and Innovative Applications in the Natural Resources Industry" was successfully held during the Second China Surveying and Geographic Information Conference, focusing on data construction standards and innovative application sharing [1][11] - The event attracted numerous experts and industry professionals from government, enterprises, and research institutions, highlighting the importance of high-quality data in modern governance [1][11] Group 1: Forum Highlights - The forum was hosted by Wu Zhigang, Chief Expert in Data from the China Electronic Information Industry Development Research Institute, with opening remarks from Li Weisen, President of the China Geographic Information Industry Association, and Hu Yuanqiao, General Manager of Shandong Land Development Group [4] - Li Weisen emphasized that natural resource data is a crucial foundation for modern governance, calling for enhanced standardization, intelligent integration, and collaborative ecosystem building to overcome current challenges [4] - Hu Yuanqiao pointed out that in the era of data-centric models, data quality directly impacts model capabilities, advocating for a collaborative approach to optimize construction layouts and promote data circulation [4][10] Group 2: Expert Insights - Wu Hongtao, Deputy Director and Chief Engineer of the Natural Resources Information Center, discussed the strategic focus on high-quality data sets in global AI competition, outlining a closed-loop system for data construction and application [7] - Li Zhenjun from Unicom Data Intelligence emphasized that large-scale, multi-modal, high-quality industry data sets are key to intelligent development, proposing a practical path for data set construction [8] - Various experts presented innovative applications of high-quality data sets across different sectors, showcasing their potential in driving digital transformation [9][10] Group 3: Future Directions - The successful forum marked a significant step towards data factorization and intelligent application in the natural resources industry, fostering a consensus on co-creating a new ecosystem for data applications [11] - As high-quality data set construction deepens and industry models are widely applied, a more intelligent, efficient, and green governance landscape for natural resources is emerging [11]
建设高质量数据集,江苏势在必行、必须先行
Xin Hua Ri Bao· 2025-11-06 08:16
Core Insights - The "2025 National High-Quality Data Set and Data Annotation Industry Supply and Demand Matching Conference" held in Nanjing successfully attracted over 500 companies and resulted in more than 90 collaborations with a transaction value exceeding 900 million yuan [1] - Jiangsu province aims to leverage its rich data resources to enhance the construction of high-quality data sets, which is essential for seizing opportunities in artificial intelligence development [1][2] - The definition of high-quality data sets varies across industries, but they must meet the training needs of AI large models [2] Industry Overview - Jiangsu has established 321 high-quality data sets across key sectors such as healthcare, transportation, industry, energy, and cultural tourism, with a total data scale exceeding 93PB [1] - The province has implemented a "1+N" policy framework to optimize the environment for artificial intelligence development, focusing on collaboration between supply and demand enterprises [2][7] Challenges in Data Annotation - Data annotation is crucial for AI development, requiring specialized knowledge and skills, particularly in complex fields like medical data [3][4] - The industry faces challenges such as insufficient data supply and a lack of skilled data annotators, which hinder the progress of large models in niche areas [4] Cost Considerations - The high costs associated with data storage and processing are significant challenges for companies, with many high-quality data sets being discarded due to storage expenses [5][6] - Companies are exploring solutions like establishing cold storage centers in less developed regions to reduce costs associated with data storage [5] Financial Support and Standards - The data industry is knowledge and capital-intensive, with a significant portion of costs tied to acquiring raw data [6] - Financial institutions are encouraged to provide support for data collection and annotation, potentially through innovative financing models [6] - The establishment of standards for high-quality data sets is underway, with guidelines and quality assessment protocols being developed to address current challenges [6]
人工智能高质量数据集生态发展大会在重庆永川举行
Xin Hua Wang· 2025-09-29 08:41
Core Insights - The conference focused on building high-quality datasets to empower AI development, emphasizing data labeling industry practices and innovations [1][6] - A partnership was established between the Chongqing Big Data Application Development Management Bureau and the Yongchuan District government to create a "Chongqing Data Set Construction Application Base" [3][4] - The West Data Labeling Research Institute and West Data Set Production Base were inaugurated to enhance digital technology sharing and data industry incubation [4][6] Group 1: Conference Highlights - The conference featured policy introductions, case sharing, and industry dialogues to promote AI data infrastructure and regional data innovation [1][6] - The Yongchuan District aims to enhance data labeling efficiency and usability to support the city's AI capabilities and business scenarios [3][6] Group 2: Strategic Initiatives - Yongchuan District signed cooperation projects with 12 companies, including major telecom operators and technology firms, to advance high-quality dataset construction and application [6][7] - The district plans to establish a data labeling industry park and implement a "data labeling + application" model to integrate digital and physical economies [6][7] Group 3: Future Goals - Yongchuan aims to become a hub for data element circulation and a data labeling service base by 2027, focusing on four key actions: building a data labeling industry park, creating a "data labeling +" ecosystem, implementing talent development initiatives, and promoting data value release [7]
超10万亿Tokens的高质量数据集是怎么炼成的?专访中国电信天翼AI阮宜龙
量子位· 2025-09-26 02:08
Core Viewpoint - The article emphasizes the importance of high-quality datasets in developing and training AI models, highlighting that such datasets are crucial for enhancing model performance and accuracy [4][6][14]. Group 1: High-Quality Data Sets - The company has amassed over 10 trillion tokens of general model corpus data and specialized datasets covering 14 key industries, with a total storage capacity of 350TB [1][6]. - These datasets are not just raw data but are meticulously labeled and optimized, making them ready for immediate application in various industries [3][4]. - High-quality datasets are essential as they directly influence the accuracy, generalization, and usability of AI models, serving as the foundation for effective model training [4][5]. Group 2: Technological Infrastructure - The company has developed the Xingchen MaaS platform, which operates as a data refinery, creating a complete closed loop of "data-model-service" [6][17]. - The platform includes a data toolchain that efficiently processes various data types and a model toolchain that enhances data into usable models, ensuring a robust data lifecycle management [18][19]. - The platform's capabilities allow for the generation of synthetic data for rare or extreme scenarios, enhancing model robustness and safety [18][19]. Group 3: Strategic Considerations - The company's investment in high-quality datasets is driven by national strategy, market demand, and its own operational advantages, positioning itself as a key player in the AI landscape [15][16]. - The government has recognized AI as a national strategy, prompting the company to build data infrastructure that supports AI technology breakthroughs [15][16]. - The company aims to leverage its extensive data resources and customer base to enhance its capabilities in high-quality dataset development [16]. Group 4: Industry Applications - The company has successfully implemented AI solutions in various sectors, such as textile quality inspection, achieving over 95% accuracy in defect detection, significantly improving production efficiency [9][26]. - High-quality datasets have been developed for multiple industries, including healthcare, agriculture, and smart cities, demonstrating the versatility and impact of AI applications [36][37]. - The company has collaborated with various sectors to create tailored datasets that address specific industry challenges, enhancing operational efficiency and service quality [36][37]. Group 5: Future Vision - The company envisions becoming a leading provider of general AI services, focusing on technological advancement, inclusive applications, and an open ecosystem for collaboration [42][43]. - It aims to cultivate a skilled workforce in AI, ensuring that technological innovations translate into practical applications that benefit society [43][44]. - The ultimate goal is to enhance the digital economy while ensuring safety and fairness in AI applications, contributing to a more equitable society [44][45].