AI生成任务指令
Search documents
放弃动捕,全面转向纯视觉数据采集,特斯拉Optimus最新训练进展曝光!
硬AI· 2025-11-03 09:20
Core Viewpoint - Tesla has shifted the training method for its humanoid robot Optimus from motion capture to pure camera data collection, utilizing video training material from employees performing daily tasks [2][3][6]. Group 1: Training Method Transition - Since June, Tesla has abandoned the previous motion capture suits and remote operation methods, opting for a camera-only data collection approach [6][8]. - Data collection workers wear helmets equipped with five cameras and carry a 30-40 pound equipment pack, repeating basic actions like wiping tables and lifting cups [6][8]. - The transition to camera data collection is expected to accelerate the scaling of data collection efforts [8]. Group 2: AI-Generated Task Instructions - Tesla has begun using AI-generated prompts to assist in training the robot, with workers receiving a series of instructions to complete actions within 3-5 seconds [11]. - Training exercises include a variety of movements, some of which may seem uncomfortable or random, but are believed to help identify areas for improvement [11]. - Data collection also occurs in the Fremont factory, where workers organize vehicle parts while wearing the camera equipment [11]. Group 3: Technical Challenges in Robot Performance - Despite showcasing capabilities in company videos, the actual performance of Optimus during training reveals significant gaps [13]. - Workers report that the robot often falls when performing tasks that require bending or tilting, and it is usually tethered to a support frame to maintain balance [15]. - Experts emphasize that demonstrations often highlight the best performances, lacking true cognitive understanding [15]. Group 4: Workforce and Data Collection - Over 100 individuals have participated in data collection, but the company has laid off several workers following a performance review in September [17]. - Workers are evaluated based on task execution, with a requirement to collect at least 4 hours of usable video material per shift [17].