Workflow
WIYH数据集
icon
Search documents
全球首个真实世界具身多模态数据集,它石智航交卷,比特斯拉还早6个月
量子位· 2025-10-10 11:24
Core Viewpoint - The article discusses the launch of the world's first large-scale real-world embodied multimodal dataset, WIYH (World In Your Hands), by the company Itai Zhihang, which integrates vision, language, tactile, and action data for human-centric applications [1][3][5]. Group 1: Dataset Features - WIYH is the first human-centric dataset that includes over 100,000 real human operation videos, covering more than 40 task types and over 100 human skills, utilizing over 13 types of sensors and encompassing more than 520 objects [3][9]. - Each data entry in the dataset contains six types of annotations corresponding to the synchronized multimodal data [4]. - The dataset emphasizes real-world scenarios, capturing human standard operating procedures in various industries, such as hotel laundry and supermarket assembly [9][10][11]. Group 2: Technical Innovations - WIYH represents two major breakthroughs: focusing on real-world scenarios and supporting large-scale multimodal data integration, which provides a solid data foundation for robots to learn complex actions and generalize across different contexts [9][16]. - The dataset features multi-layer annotations, including semantic labeling, depth information, affordance of interactive objects, language reasoning, and tactile/action trajectories, enabling rich and generalized data for embodied intelligence research [12][13]. Group 3: Industry Context - The human-centric data paradigm is gaining consensus in the industry, with companies like Tesla also focusing on real-world data collection for their robotics development [5][6][8]. - Itai Zhihang, established only six months prior, has already secured $242 million in funding, positioning itself ahead of competitors like Tesla in this technological approach [8]. Group 4: Challenges and Opportunities - The article highlights the challenges in obtaining large-scale, real, and generalizable training data for embodied intelligence, noting that traditional data sources like internet videos and simulation data have significant limitations [20][21]. - WIYH fills a gap in cross-industry, real-world data, making it possible to pre-train embodied AI models that are grounded in human experience [26].