Core Insights - The article discusses the introduction of GEN-0, a new type of embodied foundational model designed for multimodal training based on high-fidelity physical interactions, which aims to enhance robotic intelligence through real-world data [5][9]. Group 1: GEN-0 Model Features - GEN-0 inherits advantages from visual language models while achieving breakthroughs, such as capturing human-level conditioned reflexes and physical common sense [5]. - The model exhibits a strong scaling law, where increased pre-training data and computational power predictably enhance performance across multiple tasks [6][11]. - The "harmonic reasoning" mechanism allows the model to train seamlessly in synchronous thinking and action, enabling it to scale without relying on dual-system architectures [6][11]. Group 2: Data and Training Insights - GEN-0 has been pre-trained on over 270,000 hours of real-world heterogeneous manipulation data, with the dataset expanding at a rate of over 10,000 hours per week [20][22]. - Smaller models exhibit a "solidification" phenomenon when faced with data overload, while larger models continue to improve, revealing a significant "phase change" in model intelligence capacity [11][13]. - The article highlights that the scaling laws observed in the model's performance correlate with the amount of pre-training data, demonstrating a power-law relationship that can predict performance improvements [15][18]. Group 3: Future Directions - The Generalist AI Team is working on building the largest and most diverse real-world operational dataset to expand GEN-0's capabilities, covering a wide range of tasks across various environments [22]. - The model's ability to adapt to new tasks with minimal fine-tuning is emphasized, showcasing its potential for rapid deployment in diverse robotic applications [6][11].
GEN-0:史上规模最庞大多元的具身真实世界操作数据集!
自动驾驶之心·2025-11-11 00:00