Workflow
高质量数据
icon
Search documents
X @Yuyue
Yuyue· 2025-07-13 09:13
AI 模型聪明和不聪明的区别在我看来很多时候来源于数据集的差异。就像之前我对比过腾讯元宝和 deepseek 有关本地生活问题的回答可用性,发现腾讯元宝虽然内核还是 deepseek,但回答要比 deepseek 本体 “聪明” 很多,直接可以根据回答来使用究其本质,是因为腾讯元宝直接能调用大量来自微信公众号这一不算完全开放的数据库,在其中有大量自媒体分享的经验和观点。可想而知,如果小红书能做一个 AI,在生活经验上可能比腾讯元宝更牛逼一点这一问题证明了高质量数据的重要性。AI 固然能帮人找到哪里的餐厅好吃,餐厅的联系方式是什么,但只有人类能原创创造出餐厅,创造性仍然是 AI 做不到的而这两天 Tiger Research 的报告中正是提到了数据领域的危机,由于 AI 内容的泛滥,优质数据资源可能面临枯竭,这将对依赖数据驱动的 AI 模型构成重大挑战。更棘手的是,许多用户创作的内容在未获许可的情况下被用于 AI 训练,而原作者往往无法获得认可或经济回报很多老师都在说 @campnetworkxyz 快发币了,这两天也看到不少 Camp 生态的相关动态,感觉是一个新版本的 $IP ...
海天瑞声20250625
2025-06-26 14:09
海天瑞声 20250625 摘要 海天瑞声 2022 年业绩受益于自动驾驶视觉数据需求激增,2023 年受 数据出境法规影响收入下滑,但归母净利润扭亏为盈,毛利率因多模态 数据投入和独特数据集需求增加而提升。 数据标注产业预计到 2027 年复合增长率超 20%,政策支持力度加大, 七个试点城市重点发展数据标注企业,预计带动相关产值超 83 亿元, 政策层面将持续关注数据标注领域。 大模型发展驱动数据需求从通用知识向专业知识延伸,国家数据局规划 建设通识、行业通识和专识三类高质量数据集,第三方品牌服务商在垂 直、专业、深度数据挖掘应用方向的需求增加。 预计 2025 年数据标注市场规模将突破 100 亿元,增速超 30%。目前 市场竞争格局中,需求方自建团队占 60%,品牌数据服务商占 35%, 市场集中度提升。 Scale AI 为 AI 和机器学习提供数据标注和管理服务,客户包括自动驾驶、 金融、政府等,与 OpenAI、Meta、微软等合作,2023 年营收近 9 亿 美元,预计 2024 年超 20 亿美元,获 Meta 投资后估值达 290 亿美元。 Q&A 2022 年,海天瑞声的业绩实现快速增长 ...
四问人形机器人(“融”观中国)
Core Insights - The year 2023 marks a significant breakthrough for humanoid robots in China, with various events and competitions highlighting their development and increasing public interest [4] - KPMG reports that China has filed nearly 6,000 patents in humanoid robot technology over the past five years, making it the leading country in patent applications [4] - The financing scale of the humanoid robot industry in China is projected to grow from 1.58 billion yuan in 2020 to 7.23 billion yuan by 2024, with a compound annual growth rate of 35.6% [4] Group 1: China's Advantages - China possesses a robust and comprehensive supply chain, being the only country with all 41 major industrial categories recognized by the UN [5] - The extensive industrial ecosystem allows for rapid assembly and integration of components, providing a unique advantage in producing humanoid robots [5] - China has the largest application scenarios globally, particularly in smart manufacturing, which generates valuable data for training humanoid robots [5][6] Group 2: Reasons for Industry Explosion - The decline in hardware costs and advancements in intelligent capabilities are driving the explosive growth of the humanoid robot industry [7] - Breakthroughs in hardware production have led to cost-effective and high-quality solutions, enabling mass production of humanoid robots [7][8] - The development of artificial intelligence and machine learning has significantly reduced the costs associated with training robots, enhancing their capabilities [9] Group 3: China's Position in the Global Market - China is positioned in the first tier of the humanoid robot industry, excelling in hardware production while relying on North America for advanced AI development [10] - Over 80% of components for leading international robot companies are sourced from China, indicating a strong foothold in the hardware sector [10] - However, challenges remain in high-precision components and intelligent computing, where China still lags behind [10][11] Group 4: Future Development Trends - The initial focus for humanoid robots will be on industrial applications, followed by commercial and eventually household uses [12] - Achieving practical applications in household settings will require significant advancements in task understanding and cost-effectiveness [12][13] - The industry must prioritize creating robots that deliver tangible productivity value in various environments to gain public acceptance [13]
南财数据周报(51期):10个国家数据要素综合试验区启动建设;高质量数据集技术文件将加快研制
Group 1 - The release of the "Regulations on Government Data Sharing" marks a new phase of legal governance for data sharing in China, providing a legal framework for efficient data circulation and enhancing government digital governance capabilities [2][3] - The regulations address existing issues such as incomplete mechanisms and unclear responsibilities in government data sharing, aiming to eliminate "data silos" and improve the efficiency of data utilization [2] - The establishment of 10 national data element comprehensive pilot zones in various provinces aims to support the integration of the real economy and digital economy, fostering a robust data market ecosystem [3] Group 2 - A seminar on high-quality data set construction and standardization was held, focusing on guidelines, format requirements, and quality assessment for data sets, which will facilitate the application of artificial intelligence in central enterprises [4][5] - Guangzhou's "Digital Guangzhou Construction 2025 Work Points" outlines 32 key tasks for digital transformation, emphasizing the development of data resources and the establishment of a governance system for data circulation [5]
中央企业高质量数据集建设和标准化研讨会将召开
news flash· 2025-05-26 11:52
Group 1 - The core viewpoint of the article emphasizes the importance of standardization in the construction of high-quality data sets for central enterprises, aligning with the directives of the central government [1] - A seminar on high-quality data set construction and standardization is scheduled for May 29, 2025, organized by the National Data Standardization Technical Committee [1] - The seminar will focus on various aspects such as construction paths, format requirements, classification models, quality assessment, and case demonstrations, inviting experts for technical exchanges and practical sharing [1]
建设高质量数据集,让人工智能更聪明(新视点)
Ren Min Ri Bao· 2025-05-20 21:51
建设高质量数据集,有关方面在积极行动。国家数据局等17部门联合印发的《"数据要素×"三年行动计 划(2024—2026年)》提出,"推动科研机构、龙头企业等开展行业共性数据资源库建设,打造高质量 人工智能大模型训练数据集"。第八届数字中国建设峰会上,国务院国资委发布首批10余个行业、30项 央企人工智能行业高质量数据集,涵盖了电网调度AI负荷预测数据集、核电SPV设备健康诊断、运行异 常及故障预测数据集、金融大模型数据集等。 "随着基础模型开源态势的形成,各方在算力和模型算法层面的差距正在不断收窄,数据要素价值更加 凸显,已成为人工智能竞争的核心领域。"国务院国资委规划发展局副局长胡武婕表示,要推动行业高 质量数据集加速汇聚共享,为人工智能产业提供充足"养分",从而持续进行不同场景的训练优化,推动 基础模型在千行百业落地应用。 目前,高质量数据集建设还存在不少挑战。魏亮说,一方面,行业大模型对数据的需求多样,不同行业 部门对模型场景数据的需求各不相同,增加了数据处理和管理的复杂度。另一方面,在行业大模型的实 际建设中,对于构建和采买的数据没有统一衡量标准,不同行业、不同数据源的数据完整性和准确性可 能参差不齐 ...
激活海量“沉睡数据” 2030年我国数据产业规模将达7.5万亿元
Yang Shi Xin Wen· 2025-05-18 01:17
17日,记者从2025数据安全发展大会上获悉,我国将培育壮大一批数据要素产业链上下游企业,预计到2030年,我国数据产业规模将达到7.5万亿元。 公共数据开放共享 激活海量"沉睡数据" 作为全球首个将数据纳入生产要素的国家,我国已初步构建起门类齐全的数据产业链。数据显示,2024年我国年度数据生产总量达41.06泽字节,同比增长 25%。 截至目前,我国数据领域相关企业超19万家,数据产业规模超2万亿元。按照20%以上的年均增长率测算,2030年我国数据产业规模将达7.5万亿元。 国家数据局局长 刘烈宏:当前我们正谋划构建横向联通、纵向贯通、协调有力的数据基础设施体系,到2029年要基本建成国家数据基础设施主体结构。 在数据要素与产业融合方面,国家正加快打通公共数据共享开放壁垒,推动公共数据与企业数据深度融合,激活海量"沉睡数据"。 构建高质量数据集 加速人工智能发展 眼下,数据已超越传统生产要素,成为驱动人工智能技术突破与产业变革的核心动力。高质量数据集不仅是人工智能模型性能跃升的基石,更重塑了从技术 研发到商业落地的全产业链条。那高质量数据集是如何构建的? 在浙江温州,作为全国数据要素市场化改革的"试验田 ...