Workflow
UniHand2.0
icon
Search documents
具身基座模型的曙光初现,全球最强跨本体VLA来啦!
具身智能之心· 2026-01-20 00:33
Core Viewpoint - The emergence of the Being-H0.5 model is disrupting the established logic in the embodied intelligence industry, showcasing remarkable cross-embodiment generalization capabilities in visual-language-action tasks, regardless of hardware differences [3]. Group 1: Industry Trends - The competition in the embodied intelligence sector is intensifying, with companies focusing on a limited market of embodiments, where the volume of output directly influences data accumulation and algorithm performance [1]. - The Being-H0.5 model integrates data from nearly all mainstream robot configurations globally, demonstrating its ability to adapt and execute tasks effectively across different embodiments [3]. Group 2: Data Collection and Training - The UniHand-2.0 dataset, created by BeingBeyond, is the largest training dataset in the world, comprising over 14,000 hours of robot operation data and 16,000 hours of human video data, with a total of over 400 billion training tokens [6]. - Unlike previous studies that focused on specific robot configurations, UniHand-2.0 successfully merges data from over 30 different hardware configurations, addressing the challenge of significant differences in state and action spaces among various robots [8][10]. - The human-centric training paradigm enhances the model's capabilities by utilizing a vast amount of human video data, which contains rich physical and spatial prior information, enabling better generalization across tasks [11][14]. Group 3: Model Architecture and Performance - Being-H0.5 features a specialized expert mixture model that decouples multi-modal understanding from action generation while maintaining a coupling through a shared attention mechanism [17]. - Extensive real-world experiments on various robot configurations demonstrated the model's exceptional cross-embodiment and complex task execution capabilities, achieving success rates of 98.9% and 54% on widely used benchmarks [18]. Group 4: Industry Impact - The introduction of Being-H0.5 represents a significant advantage for most embodied companies, as it alleviates the need for substantial investments in data collection centers and allows for the adaptation of different configurations using human-centric learning as a natural data source [19].