星尘智能x清华x MIT发布CLAP框架!让机器人看视频学操作技能
具身智能之心·2026-01-20 00:33

Core Viewpoint - The article discusses the introduction of the Contrastive Latent Action Pretraining (CLAP) framework by Stardust Intelligence in collaboration with Tsinghua University, Hong Kong University, and MIT, which enables robots to learn skills directly from videos, addressing the long-standing data scarcity issue in robot training [2][4]. Summary by Sections Introduction of CLAP Framework - The CLAP framework aligns the motion space extracted from videos with the action space of robots, allowing robots to learn skills from abundant human behavior videos available online [3][4]. Challenges in Robot Learning - Traditional robot learning faces a "data scarcity" problem, where there is an abundance of human behavior videos but a lack of specific training data for robots. This is due to the high costs and inefficiencies associated with collecting robot operation data [3]. Innovations of CLAP Framework - CLAP addresses the "visual entanglement" issue prevalent in existing latent action models, effectively mapping state transitions from videos to a quantifiable, physically executable action codebook [4]. - The framework utilizes two modeling paradigms: CLAP-NTP, which excels in instruction following and object generalization, and CLAP-RF, which focuses on high-frequency, fine-grained control [4][8]. Efficiency and Cost-Effectiveness - The CLAP framework significantly enhances data utilization efficiency, allowing robots to learn from vast amounts of video content on platforms like YouTube and Douyin, thus lowering the barriers to acquiring robotic skills [4]. Knowledge Transfer and Model Performance - CLAP incorporates a Knowledge Matching (KM) regularization strategy to mitigate catastrophic forgetting during model fine-tuning, ensuring that robots retain previously learned skills while acquiring new ones [5]. - Experimental results indicate that CLAP outperforms strong baseline methods, effectively transferring skills learned from human videos to robot execution [12]. Industrial Application Prospects - The long-term value of the CLAP framework lies in its potential to accelerate the industrialization of robotics, reducing costs and deployment times for businesses, which could lead to widespread applications in service and manufacturing sectors [5].

星尘智能x清华x MIT发布CLAP框架!让机器人看视频学操作技能 - Reportify