高质量数据集
Search documents
首批80个人工智能赋能新型工业化高质量数据集需求发布!公开征集建设单位——
机器人圈· 2026-03-25 09:12
Core Viewpoint - The Beijing Municipal Bureau of Economy and Information Technology has released a notice for the first batch of high-quality data set demand for artificial intelligence-enabled new industrialization in 2026, aiming to support model development and deep industry applications [1]. Group 1: Subject Requirements - The construction units must be legally independent entities registered in Beijing, with good operational status over the past three years and the necessary technical capabilities, talent teams, and financial support for data set construction [1]. Group 2: Construction Requirements - The data sets must align with the specified directions, content, and standards in the demand list, directly usable for AI model development and capable of enhancing model performance [2]. - Data set construction must comply with the Data Security Law and Personal Information Protection Law of the People's Republic of China, ensuring no intellectual property disputes and the absence of sensitive or illegal data [2]. Group 3: Application and Construction Process - Applicants must fill out the "Application for High-Quality Data Set Construction" form based on their strengths and submit it electronically by April 20, 2026 [3][4]. - Each applicant can submit a maximum of five data sets in the same batch to ensure quality and focus [4]. - The Bureau will conduct a strict review based on the applicant's overall strength, feasibility of the construction plan, and expected application outcomes [4]. Group 4: Implementation and Evaluation - Construction units must adhere to the approved application plan, maintaining quality and progress, with a construction period not exceeding 12 months [4]. - Upon completion, both parties will conduct acceptance evaluations, and the construction unit can apply for subsequent funding rewards based on the completed project [4]. Group 5: Other Matters - Applicants are responsible for the authenticity and compliance of their submitted materials, with strict penalties for any fraudulent activities [5]. - Additional data set application needs can be submitted regularly, with the Bureau continuously updating the demand list [5].
关注能源、有色上游分化
Hua Tai Qi Huo· 2026-03-25 05:27
Report Summary 1. Report Industry Investment Rating No information provided in the content. 2. Core View of the Report The report focuses on the differentiation in the upstream sectors of energy and non - ferrous metals, and provides an overview of mid - view events and the industry situation [1][2]. 3. Summary by Related Catalogs Mid - view Event Overview - **Production Industry**: By the end of 2025, over 100,000 high - quality data sets were built in China. By March 2026, the daily average Token call volume exceeded 140 trillion, a more than 1000 - fold increase from the beginning of 2024 and a 40% increase from the end of 2025 [1]. - **Service Industry**: The Medium - term Lending Facility (MLF) has been increased and renewed for 13 consecutive months. On March 25, 2026, a 500 - billion - yuan MLF operation with a 1 - year term will be carried out. Chengdu and Wuhan have introduced housing - related policies, including increasing the maximum loan amount and soliciting opinions on the implementation rules for off - site personal housing loans [1]. Industry Overview - **Upstream**: Copper, aluminum, and nickel prices in the non - ferrous sector, natural rubber prices in the agricultural sector, and crude oil prices in the energy sector have declined, while natural gas prices in the energy sector have risen [2]. - **Midstream**: The PX operating rate in the chemical sector has declined, the PTA operating rate has increased, power plant coal consumption in the energy sector has decreased, and the operating rate of pig products in the agricultural sector has increased [3]. - **Downstream**: The sales of commercial housing in first - and second - tier cities have seasonally declined, and the number of domestic and international flights is at a high level compared to the same period [3].
“十五五”规划纲要计算机行业解读:智能经济启航,AI Agent主导未来五年AI叙事
Zhong Guo Yin He Zheng Quan· 2026-03-15 03:24
Investment Rating - The report maintains a "Buy" rating for the computer industry [4] Core Insights - The "14th Five-Year Plan" emphasizes the core strategic position of artificial intelligence (AI) in national development, with the term "artificial intelligence" appearing 30 times, compared to only 6 times in the previous plan [6][8] - The next five years will see AI Agents as the driving force for economic transformation, with a focus on high-value AI Agent growth leading to significant value creation [6][10] - The demand for intelligent computing power is expected to rise significantly, with projections indicating that by 2028, intelligent computing power will account for over 95% of total computing power in China [6][12] - The report highlights the emergence of "Token inflation" due to the rapid growth in AI model usage, with a projected annual Token consumption increase from 0.0005 PetaTokens in 2025 to 152,667 PetaTokens by 2030, reflecting a CAGR of 3418% [6][24] - Investment opportunities are identified in AI-native application companies, edge AI technologies, domestic computing power chain replacements, and collaborative infrastructure for computing and electricity [6][38] Summary by Sections Section 1: The "14th Five-Year Plan" as a Key Period for Intelligent Economy - The plan introduces the concept of "intelligent native," suggesting AI may become a new production factor [11] - The intelligent economy will drive the reconstruction of AI factor value [13] Section 2: Outlook for the "14th Five-Year Plan" - The intelligent economy is set to initiate a rapid explosion in Token usage, with AI Agents transitioning from cost centers to profit centers [17][38] - The report anticipates a significant increase in the number of active AI Agents, from approximately 28.6 million in 2025 to 2.216 billion by 2030, with a compound annual growth rate (CAGR) of 139% [24] Section 3: Comprehensive Upgrade of AI Factors During the "14th Five-Year Plan" - The report emphasizes the importance of high-quality data sets as a core barrier for building irreplaceable AI Agents [16] - The demand for high-quality, proprietary data sets is expected to surge, with a focus on transforming data resources into valuable assets [16] Section 4: Investment Recommendations - The report suggests focusing on AI-native application companies capable of generating scalable revenue, as well as companies that integrate AI Agents with vertical industry know-how [6][38] - Specific companies to watch include Horizon Robotics, JingTai Holdings, Meitu, and others [6]
政府工作报告,为什么点名“高质量数据集”
第一财经· 2026-03-07 12:02
Core Viewpoint - The article emphasizes the increasing importance of high-quality data in the development of artificial intelligence (AI), as highlighted in the 2026 government work report, which aims to foster a new intelligent economy and improve data resource utilization [3][4][5]. Group 1: Government Initiatives and AI Development - The 2026 government work report calls for the expansion of "AI+" initiatives, promoting the commercialization and large-scale application of AI in key industries, with the AI-related industry expected to grow to over 10 trillion yuan by the end of the 14th Five-Year Plan [4]. - The report reiterates the need to build high-quality data sets and improve the foundational data systems necessary for AI development [5][6]. Group 2: Data Quality and AI Training - High-quality data is essential for training AI models, with the article noting that the quality of data directly impacts model performance [6][7]. - As AI evolves from generative AI to physical AI, the demand for high-quality data becomes more critical, particularly for applications like smart driving and humanoid robots, where the complexity of required data increases [7][8]. Group 3: Challenges in Data Acquisition - The article discusses the challenges in acquiring high-quality data for physical AI, stating that while internet data is abundant, it is often unsuitable for training physical AI systems [9][10]. - The need for strong interactive data environments for embodied intelligence is highlighted, as traditional internet data does not facilitate necessary interactions [9][12]. Group 4: Potential of Private and Synthetic Data - There is significant untapped potential in private data across various industries, such as pharmaceuticals and fashion, which could provide high-quality insights for AI models [10][11]. - Synthetic data is identified as a promising area for development, with expectations for significant advancements in 2026, although the quality of synthetic data remains a concern [11][12]. Group 5: Data Standardization and Collaboration - The article points out the lack of a comprehensive data standardization system, which hampers data sharing and reuse across different manufacturers and sectors [13]. - There is a call for industry collaboration and innovation centers to address foundational data acquisition challenges and improve data quality [12][13].
GEO时代 AI友好型内容生态构建指南
Sou Hu Cai Jing· 2026-01-29 07:04
Core Insights - Companies must elevate the construction of an AI-friendly content ecosystem to a core digital strategy directly overseen by the CEO, rather than treating it as a tactical move by the marketing department [2][3] Understanding AI's "Cognitive" Logic - The understanding of GEO (Generative Engine Optimization) requires comprehension of how large models process information, differing from traditional search engines by focusing on semantic parsing and intent recognition rather than keyword density [4] - The three key stages in AI's response generation include semantic parsing and intent recognition, knowledge retrieval and validation, and answer generation with confidence assessment [4][5] Strategic Transformation of Content Ecosystem - GEO should be treated as a top-level initiative, with a dedicated "AI Content Strategy Committee" led by the CMO and involving other key executives to oversee the transformation of the company's knowledge assets [6] - Companies should allocate 0.5% to 1% of annual revenue for GEO-specific funding and restructure KPI assessment to include new metrics like "AI citation coverage" and "knowledge graph completeness" [6] Four Key Elements of AI-Friendly Content Ecosystem - The first key element is structured content, which should break down complex information into independent, labeled knowledge modules, avoiding lengthy articles [8][9] - The second element is the DSS principle (Depth-Support-Source) to build trust in content, requiring semantic depth, data support, and authoritative sources [9][10] - The third element involves multi-modal optimization, ensuring content is accessible across various media formats, including images, videos, and audio [11] - The fourth element is the construction of a corporate knowledge graph and high-quality datasets to connect dispersed content nodes into a semantic network [12] Implementation Path for Marketing GEO - Companies must protect brand tone and ensure that all GEO content undergoes a "brand persona review" by the PR department before publication [13][14] - An agile iteration mechanism is recommended, with bi-weekly updates to monitor and analyze content performance in AI platforms [14] - Risk management strategies should include clear terms for AI content usage and thorough fact-checking of all GEO content [15] Continuous Optimization of GEO - Establishing a content update mechanism to respond to industry changes and regularly refresh data is crucial [16] - Upgrading technical architecture to support API openness and real-time synchronization is necessary [17] - Building organizational capabilities through GEO certification training and external collaboration is essential [18] Conclusion - GEO is not merely a technical buzzword but a core capability for companies to thrive in the era of large models, creating a positive feedback loop from being discovered to being trusted and recommended [19]
2025年中国企业级AI应用行业研究报告
艾瑞咨询· 2026-01-28 00:07
Core Insights - The enterprise-level AI application industry is transitioning from a technology exploration phase to a large-scale application phase, driven by advancements in large language models [1][14] - Key challenges in scaling AI applications include the need for systematic, end-to-end implementation capabilities rather than relying solely on technological breakthroughs [1][23] - AI Agents are becoming the core vehicle for enterprise-level AI applications, facilitating deep integration with business processes [1][29] Application Layer - AI Agents are central to the implementation of enterprise-level AI applications, breaking down tasks into smaller units and integrating with business processes through various methods [1][29] - The focus is on enhancing efficiency in processes, amplifying knowledge, and innovating value through AI applications [17][27] Supporting Layer - A data-centric approach is essential for model selection, emphasizing the construction of a robust data foundation and a data security system tailored for AI [1][41] - High-quality datasets are critical for AI development, enabling effective model training and application [41][42] Infrastructure Layer - The evolution of AI computing infrastructure is moving towards a heterogeneous model, highlighting the importance of deep collaboration between software and hardware in the context of domestic alternatives [1][50][53] - AI infrastructure is crucial for optimizing the performance and cost-effectiveness of AI applications [53] Organizational Layer - Leadership commitment and top-level design are vital for driving AI transformation within organizations, alongside the need for role upgrades among employees [1][56][60] - Employees must transition from traditional roles to AI collaborators, requiring new skills to effectively integrate AI into business processes [60] Vendor Landscape - The enterprise-level AI application market consists of four main categories: application software, technical services and solutions, cloud services, and AI model providers, creating a dynamic competitive landscape [2][65] - Established companies leverage their industry expertise to extend AI applications, while startups focus on specific scenarios to complement existing systems [65][66] Development Trends - Future trends include the evolution of large models from single architectures to multi-architecture iterations, deep integration of AI into business processes, and the emergence of AI-native applications [2][8] - AI is expected to reshape research processes and enhance competitive advantages for enterprises [2][8] Financing and Investment - Over 50% of AI financing events are concentrated in the application layer, with AI in healthcare emerging as a popular investment area [12][14] Challenges in Scaling - Key bottlenecks in scaling AI applications include weak data foundations, lack of quantifiable business value, and a shortage of talent with both technical and business insights [23][27]
多领域数据集填补空白 北京亦庄最高兑现200万元奖励
Zhong Guo Xin Wen Wang· 2026-01-22 15:38
Core Insights - Beijing Economic-Technological Development Area (also known as Beijing E-Town) is utilizing financial incentives to enhance the value of data elements, aiming to establish a robust foundation for an AI city and promote high-quality development in the data industry [1][6] Group 1: High-Quality Data Sets - High-quality data sets are crucial for upgrading AI applications, serving as precise samples for training large models and bridging the gap between general and industry-specific models [2][6] - The recent policy "Data 20 Articles" aims to support the construction of high-quality data sets, with 20 companies awarded for 38 data sets across key industries such as embodied intelligence, biomedicine, industrial manufacturing, and intelligent networking [2][6] - Notable achievements include the RoboMIND2.0 data set from Beijing Humanoid Robotics Innovation Center, which fills a domestic gap in biped humanoid robot open-source data [2][3] Group 2: Industry-Specific Innovations - In biomedicine, the data set created by Micronaut Medical combines expert diagnostic opinions with AI quality control and clinical information, receiving a digital asset registration certificate [3] - Beijing Ant Workshop has developed a compliance data set for flexible manufacturing, addressing gaps in data-driven intelligent manufacturing and sustainable training for large models [3] - Four-Dimensional Map Technology's data set for autonomous driving introduces a unique "4D spatiotemporal + automated closed-loop" model, addressing data shortages in complex traffic scenarios in China [3][5] Group 3: Funding and Development - The awarded funds are seen as catalysts for further research and development, with companies planning to invest in upgrading their data sets and building ecosystems [4][5] - Companies like Beijing Ant Workshop and Micronaut Medical express intentions to enhance their data capabilities and support the application of AI in their respective fields [4][5] Group 4: Future Prospects and Policy Support - The Beijing E-Town aims to create a benchmark for data industry clusters, with plans to implement policies that will exceed 200 million yuan by 2026, focusing on key areas such as data circulation infrastructure and core technology breakthroughs [6] - The initiative encourages more companies to participate in high-quality data set construction, fostering a collaborative environment for innovation and application [6]
北京亦庄打造“亦城数港”产业集聚标杆
Zhong Guo Jing Ji Wang· 2026-01-22 05:48
Core Insights - Beijing Economic-Technological Development Area (also known as Beijing Yizhuang) is leveraging financial incentives to activate the value of data elements, establishing a robust foundation for an all-encompassing artificial intelligence city [1] - High-quality datasets are recognized as essential for upgrading AI applications, serving as precise samples for training large models and facilitating the transition from general-purpose to industry-specific models [1][2] - The "Data 20 Measures" policy, released in 2025, aims to support the construction of high-quality datasets, with recent awards recognizing 20 companies for their 38 datasets across key industries [1][3] Group 1 - The awarded datasets cover critical sectors such as embodied intelligence, biomedicine, industrial manufacturing, and intelligent networking, leading to groundbreaking advancements in data supply [1][2] - The emergence of high-quality datasets is characterized by high-value applications, high knowledge density, and high technical content, becoming core production factors in the digital economy [2] - Beijing Yizhuang is promoting industrial process optimization, technological innovation, and model iteration through the filling of data gaps and the construction of specialized application datasets [2] Group 2 - Beijing Yizhuang has been approved as a national pilot area for data industry aggregation, aiming to establish a benchmark for the "Yicheng Data Port" [3] - By 2026, the focus will be on key aspects of the data industry chain, with a total scale exceeding 200 million yuan, covering areas such as data circulation infrastructure, core technology breakthroughs, and typical case recognition for high-quality datasets [3] - The initiative includes a tiered and precise funding support system to stimulate the potential of data elements, facilitating the leap in capability and scale of the data industry in Beijing Economic Development Zone [3]
怎样的数据才算“高质量”?南京玄武:全国首笔具身智能数据集交易的背后
Yang Zi Wan Bao Wang· 2026-01-03 13:51
Core Insights - The article highlights the transition of the artificial intelligence industry from a "model-driven" approach to a "data-driven" one, emphasizing the increasing importance of high-quality datasets as a scarce resource for AI technology implementation [1][4] Group 1: Company Overview - Zhujing Intelligent Technology Co., Ltd. has developed a "embodied intelligence dataset" that was successfully traded at the Jiangsu Data Exchange, marking a significant milestone in the field [1][3] - The dataset includes approximately 25,000 structured data entries covering various scenarios such as office work, supermarkets, catering, and housekeeping, with each data entry lasting about 10 seconds and varying in size from tens to hundreds of megabytes [3][4] Group 2: Market Demand and Value - High-quality data products are becoming the focal point of market competition, characterized by high application value, high knowledge density, and high technical content [4][6] - Companies purchasing these datasets gain access to thoroughly cleaned and annotated data, significantly reducing the time and cost associated with building data collection environments and ensuring data quality [4][8] Group 3: Ecosystem Development - Jiangsu Province aims to build high-quality datasets across key sectors, with a target of 321 datasets and a total data scale exceeding 93PB by October 2025, equivalent to 93 million two-hour movies [6][11] - The region is fostering a data element industry ecosystem by establishing key infrastructures such as the Jiangsu International Data Port and the Jiangsu Data Exchange, promoting the understanding of data asset value among enterprises [6][8] Group 4: Standardization and Future Pathways - Standardization is viewed as a critical pathway for the construction of high-quality datasets, addressing practical pain points in data application and ensuring effective value realization [10][11] - The national strategy for high-quality dataset management includes a comprehensive framework focusing on infrastructure, construction entities, and application scenarios to support AI model development [11][13]
AI演进新阶段:智能体崛起呼唤高质量数据供给
Zhong Guo Xin Wen Wang· 2025-12-07 02:37
Group 1 - The 2025 Digital Intelligence Technology Ecological Conference in Guangzhou focuses on the deep integration of artificial intelligence and digital technology [1] - The National Data Bureau emphasizes the need for open collaboration to build a unified national data element market, fostering a more open industrial ecosystem [1] - Guangdong Province is leading in the market-oriented allocation of data elements, viewing AI and data as core new productive forces driving high-quality economic growth [1] Group 2 - China Telecom introduced the Starry Sky Intelligent Service Platform 1.0, featuring the "Xing Xiaochen" intelligent agent for cross-terminal and cross-scenario intelligent services [2] - The intelligent agent supports users in completing complex tasks through natural language, transforming user interaction [2] - High-quality datasets are identified as the foundation for enhancing AI capabilities, with over 500 PB of high-quality datasets constructed nationwide as of September [2] - The National Data Development Research Institute proposes a new approach to advance high-quality dataset construction, focusing on compliance review, quality assessment, and industry mapping [2]