Workflow
字节发了个机器人全能大模型,带队人李航
量子位·2025-09-06 04:21

Core Viewpoint - Byte's Seed has introduced Robix, a single model that integrates reasoning, task planning, and natural language interaction for robots, eliminating the need for multiple modules [1][4][27]. Group 1: Robix Model Overview - Robix is designed to handle high-level cognitive tasks while a lower-level system (VLA) executes commands issued by Robix [6][9]. - The model is a visual-language integrated single model that processes images and language simultaneously, streamlining communication and decision-making [10][11]. - It employs a chain of thought reasoning and a three-stage training strategy to enhance its capabilities [11][12]. Group 2: Training Methodology - The training consists of three phases: 1. Continuous pre-training with extensive robot-related data to understand 3D space and correlate language with visuals. 2. Supervised fine-tuning using real-world scenarios to teach task handling and basic conversation skills. 3. Reinforcement learning to correct discrepancies between thought and action through a reward system [19][20]. Group 3: Performance Metrics - In foundational ability tests, Robix outperformed Qwen 2.5-VL in 7 out of 8 spatial understanding tasks, achieving higher average accuracy [21]. - Robix's performance in various benchmarks shows it surpassing closed-source models like GPT-4o and Gemini 2.5 Pro in most tests [21][22]. - In real-world interaction tests, Robix-32B achieved an average task progress of 92.5%, exceeding Gemini 2.5 Pro and GPT-4o by 4.3 and 28.1 percentage points, respectively [25]. Group 4: Leadership and Development - The project is led by Dr. Li Hang, who has a significant background in AI and robotics, previously serving as the head of Huawei's Noah's Ark Lab [28][30]. - Despite rumors of retirement, Dr. Li continues to contribute to the project in a consulting capacity [31].