Workflow
具身智能训练数据
icon
Search documents
数据“燃料”催化具身智能:训练工厂涌现 行业加速破局
Core Insights - The development of embodied intelligence is currently hindered by a shortage of data, with the available data for this industry being only a fraction of that used by large language models [1][2] - The establishment of data collection factories across various regions in China, such as the Super EID Factory in Tianjin and others in Shanghai and Beijing, marks the beginning of efforts to address this data scarcity [1][2] Group 1: Data Collection Facilities - The Super EID Factory in Tianjin spans 12,000 square meters and focuses on producing data rather than physical products, with a daily capacity to collect up to 550,000 data points and an annual output of 200 million high-quality data points [2][4] - The Zhiyuan data collection center in Shanghai, covering 3,000 square meters, has already collected over one million high-quality data points across various real-world scenarios [3][5] Group 2: Data Collection Methods - Different companies employ various methods for data collection; for instance,帕西尼 uses human data collection where operators wear devices to capture actions, while Zhiyuan employs robotic teleoperation to gather data through repetitive tasks [4][5] - The data collection process involves capturing both simulated and real data, with real data being the most valuable yet challenging to obtain [2][3] Group 3: Industry Challenges and Standards - The lack of standardized data collection practices poses a significant challenge for the industry, making it difficult to reuse and transfer training data across different applications [6] -帕西尼 and Zhiyuan are working towards building an open ecosystem to enhance data availability and quality, with帕西尼 actively participating in establishing data collection standards [6]