Core Insights - The article discusses the introduction of GEN-0, a new type of embodied foundational model designed for multimodal training based on high-fidelity physical interactions, which aims to enhance robotic intelligence through real-world data [5][9]. Group 1: Model Characteristics - GEN-0 has been developed to capture human-level reflexes and physical common sense, featuring a core characteristic called "harmonic reasoning" that allows seamless training of thinking and action [5]. - The model has surpassed the critical threshold of 7 billion parameters, showing a phase transition where smaller models become stagnant while larger models continue to improve [6][11]. - GEN-0 demonstrates a strong scaling law, indicating that increased pre-training data and computational power predictably enhance the model's performance across multiple tasks [6][11]. Group 2: Data Utilization - The model is pre-trained on over 270,000 hours of real-world heterogeneous manipulation data, with the dataset expanding at a rate of over 10,000 hours per week [22]. - The data collection comes from diverse operational scenarios across thousands of households, warehouses, and workplaces, aiming to cover all conceivable operational tasks [24]. Group 3: Implications for Robotics - GEN-0 signifies a new era in embodied foundational models, where capabilities will grow predictably with real physical interaction data rather than relying solely on text, images, or simulated data [9]. - The findings highlight that smaller models struggle to process complex sensory-motor data during pre-training, while models with over 70 billion parameters can internalize large-scale pre-training data and quickly adapt to downstream tasks with minimal fine-tuning [15][11].
史上规模最庞大、最多元的真实世界操作数据集!具身领域的Scaling Law来了~
具身智能之心·2025-11-09 14:08