Galaxea R1 Lite移动双臂机器人

Search documents
Galaxea 团队推出:大规模高质量开放世界数据集与G0双系统VLA模型
具身智能之心· 2025-09-04 01:04
Core Insights - The article presents the Galaxea Open-World Dataset, a large-scale and diverse collection of robot behaviors recorded in real human living and working environments, addressing the scarcity of high-quality open-world robot data and insufficient model generalization [3][5][6]. Dataset Overview - The dataset consists of 500 hours of data, 100,000 demonstration trajectories, covering 150 task categories, 1,600 object types, and 58 operational skills, with a 2Hz frequency for detailed sub-task instruction labeling [8][12]. - Data was collected using the Galaxea R1 Lite mobile dual-arm robot, which has 23 degrees of freedom and is equipped with RGB cameras for global scene perception and fine operation sensing [5][6]. Data Diversity and Coverage - The dataset includes data from 11 physical sites across 50 unique scenarios, covering residential, retail, dining, and office environments, thus avoiding the limitations of existing datasets that are confined to controlled laboratory settings [6][12]. - The distribution of tasks shows a balance between basic actions and specialized skills, with residential scenes making up 50.8% and office scenes 33.2% of the dataset [11][12]. G0 Dual-System Framework - The G0 framework couples a "slow thinking" visual-language model (G0-VLM) with a "fast execution" visual-language-action model (G0-VLA), employing a three-stage training strategy to achieve complex task planning and precise execution [5][19]. - The training phases include cross-entity pre-training, single-entity pre-training, and task-specific fine-tuning, which enhance the model's performance significantly [21][30]. Model Performance Evaluation - The G0-VLA model demonstrated superior performance in benchmark tasks such as desktop organization and microwave operation, with G0-Full achieving the highest average task progress scores [39][47]. - The study found that single-entity pre-training is essential for effective model adaptation, as cross-entity pre-training can lead to negative transfer due to significant differences between the training and target robot entities [39][46]. Key Findings - The G0-VLM model outperformed mainstream visual-language models in instruction accuracy, achieving 83.3% in desktop organization and 78.2% in bed-making tasks, highlighting the importance of domain-specific fine-tuning [42][47]. - The dataset's design and the dual-system framework effectively address the challenges of real-world robot task execution, providing a robust foundation for future advancements in embodied intelligence [17][19].
Galaxea 团队推出:大规模高质量开放世界机器人数据集与G0双系统VLA模型
具身智能之心· 2025-09-03 03:23
Core Insights - The article presents the Galaxea Open-World Dataset, a large-scale and diverse collection of robot behaviors recorded in real human living and working environments, addressing the scarcity of high-quality open-world robot data and insufficient model generalization capabilities [2][5][6]. Dataset Overview - The Galaxea Open-World Dataset is the first large-scale robot behavior dataset collected in real-life scenarios, solving issues of existing datasets that are limited to controlled environments and inconsistent robot entities [5][17]. - Data collection was conducted using the Galaxea R1 Lite mobile dual-arm robot, which features 23 degrees of freedom and is equipped with RGB cameras for global scene perception and fine operation sensing [8][6]. - The dataset includes 500 hours of data, 100,000 demonstration trajectories, covering 150 task categories, 1,600 object types, and 58 operational skills, with a 2Hz frequency for detailed sub-task instruction labeling [8][12]. Model Framework - The G0 dual-system framework couples a "slow thinking" visual-language model (G0-VLM) with a "fast execution" visual-language-action model (G0-VLA), utilizing a three-stage training strategy to achieve complex task planning and precise execution [5][19]. - The training phases include cross-entity pre-training, single-entity pre-training, and task-specific fine-tuning, which are designed to balance general knowledge and specific robot adaptation [21][27]. Performance Evaluation - The G0-VLA model demonstrated superior performance in benchmark tasks such as desktop organization, microwave operation, bed making, and block building, with G0-VLM achieving an instruction accuracy of 78.2% in bed making and 83.3% in desktop organization [42][47]. - The study found that single-entity pre-training is essential for effective model performance, as cross-entity pre-training can lead to negative transfer due to significant differences between the training and target robot entities [39][46]. Key Findings - The dataset's design emphasizes real-world adaptability and model training friendliness, ensuring that the collected data reflects the complexities of human environments [6][17]. - The G0 model's architecture is inspired by Kahneman's dual-system theory, where System 2 (slow thinking) is responsible for planning and System 1 (fast execution) handles real-time reactions, allowing for a balance between planning rationality and execution timeliness [19][21].