瞭望 | 破题数据之困
Xin Hua She·2025-11-18 03:06

Core Insights - The development of embodied intelligent models requires the collection of multimodal data, with a current shortfall of at least two orders of magnitude compared to the required data volume [1][4] - The industry is innovating various data collection methods to bridge the gap in real-world data volume, focusing on enhancing data quality through technological innovation and standardization [2][6] Data Collection and Quality Improvement - Real-world data is essential for training large models to achieve high generalization capabilities, but the cost and efficiency of collecting such data are significant challenges [4][5] - Companies like Beijing Humanoid Robot Innovation Center are building high-density, high-quality datasets for various robotic configurations, which have improved task success rates by approximately 20% [5] - The use of data gloves for real-time collection of high-precision operational data is being explored, with one device capable of collecting 5,000 data points daily, contributing to over one million hand operation data points [5][6] - The deployment of autonomous driving technologies is providing a pathway for acquiring vast amounts of diverse real-world data, aiding in model training and evaluation [6] Standardization and Data Utilization - The establishment of data collection standards is crucial for ensuring the usability of collected data across different robotic models, as current data formats and collection processes are inconsistent [7] - Companies are taking steps to address data standardization, with initiatives like the certification of humanoid robot datasets to ensure compliance and reduce adaptation costs [7] - To maximize data value, there is a need for improved data circulation and sharing, with suggestions for creating a data-sharing platform that incentivizes companies to contribute data [8] - Legislative measures are necessary to ensure data privacy and security, particularly as real-world data collection becomes more prevalent in sensitive environments [8]