开源1万小时具身智能数据,这家公司是为了什么?
具身智能之心·2026-01-08 04:23

Core Viewpoint - The article emphasizes the importance of high-quality, large-scale, and diverse datasets for advancing embodied intelligence, highlighting the release of the "10Kh RealOmni-Open DataSet" by JianZhi Robotics as a significant milestone in the industry [1][4][16]. Dataset Overview - The "10Kh RealOmni-Open DataSet" consists of over 10,000 hours of data and nearly one million clips, making it the largest and most generalized open dataset in the field [1][4]. - The dataset focuses on 10 common household tasks, ensuring that each skill has over 10,000 clips, which enhances both the scale and depth of skills covered [4][5]. Data Quality and Specifications - The dataset features high-quality recordings with a resolution of 1600x1296 pixels and a frame rate of 30 fps, ensuring clarity and detail in the captured actions [4][5]. - It achieves centimeter-level trajectory precision, with advanced IMU hardware and cloud reconstruction techniques enhancing the accuracy to sub-centimeter levels [4][12]. Skill and Task Coverage - The dataset prioritizes tasks that can be performed with one hand in real scenarios, with 99.2% of the clips involving "two-handed, long-range tasks," providing a realistic representation of household activities [5][7]. - The average clip length is 1 minute and 37 seconds, capturing the complete process of tasks rather than static snapshots, which aids in understanding action logic and causality [5][7]. Data Collection Methodology - The data was collected from 3,000 real households, ensuring a rich variety of scenarios and natural human interactions, addressing the limitations of traditional data collection methods [7][9]. - JianZhi Robotics employs a comprehensive data production chain, allowing for rapid accumulation of data, with nearly one million hours collected in just two months [9][11]. Technological Infrastructure - The Gen DAS Gripper is a key component in the data collection process, enabling quick deployment without the need for extensive site preparation [11]. - The Gen Matrix data platform processes and cleans the collected data, achieving high precision in trajectory reconstruction and synchronization of heterogeneous data sources [13]. Future Directions - The open-sourcing of this dataset is seen as a way to accelerate innovation in embodied intelligence by filling data gaps, standardizing formats, and lowering research barriers [16]. - JianZhi Robotics plans to continue enhancing its data infrastructure and releasing more beneficial datasets and services, fostering a positive cycle of data sharing, model optimization, and practical application [16].

开源1万小时具身智能数据,这家公司是为了什么? - Reportify