Workflow
以人为中心的训练范式
icon
Search documents
以最低图像分辨率斩获SOTA!全栈开源具身模型发布:3.5万小时炼出通用大脑
量子位· 2026-01-23 12:09
Core Insights - The article discusses the breakthrough of the Being-H0.5 model in the field of embodied intelligence, addressing the challenges posed by data isolation and the "Matthew Effect" in the industry [1][3][39] - Being-H0.5 is the largest VLA model with 35,000 hours of training data, enabling cross-robot zero-shot skill transfer and showcasing remarkable generalization capabilities [2][3][30] Data and Model Development - The Being-H0.5 model integrates 35,000 hours of data, including 14,000 hours of robot data and 16,000 hours of human data, across 30 robot types, allowing for rapid adaptation and stable execution regardless of hardware configuration [2][8] - The UniHand-2.0 dataset, an iteration of UniHand-1.0, features over 35,000 hours of high-quality data, marking a significant advancement in cross-domain data integration [8][9] Training Paradigms - The model employs a human-centric learning paradigm, aligning human intent with robotic actions through a unified token sequence that captures physical interaction signals [20][39] - A unified action space framework is established to overcome the dimensional gap between heterogeneous hardware, facilitating joint training and knowledge sharing [16][17] Architectural Innovations - The Mixture-of-Flow (MoF) architecture allows for the decoupling of action experts, focusing on learning universal motion primitives while ensuring precise execution for specific robot types [22][23] - The model incorporates mechanisms like manifold-preserving gating and universal async chunking to enhance robustness and adaptability across different hardware [23][24] Performance and Validation - Extensive real-world testing on various robot types demonstrated that Being-H0.5 can perform complex tasks, achieving competitive success rates compared to specialized models [28][30][35] - The model's performance in quantitative evaluations shows it surpasses existing VLA models, achieving an average success rate of 98.9% in specific tasks [35][36] Open Source and Future Directions - The BeingBeyond team commits to a full-stack open-source approach, providing not only pre-trained models but also complete training frameworks and evaluation tools to foster community innovation [37][38] - The vision is to establish Being-H0.5 as a foundational infrastructure in the embodied intelligence sector, enabling rapid development without the need for extensive data collection [39]