Robix
Search documents
字节团队最新Robix!全能大模型,一个模型就能搞定机器人推理、任务规划和交互
具身智能之心· 2025-09-08 00:03
Core Viewpoint - The article discusses the development of Robix, a unified visual-language model by ByteDance, aimed at addressing the limitations of existing hierarchical robotic systems in understanding and executing tasks in dynamic environments [2][3][4]. Group 1: Problem Identification - Current hierarchical robotic systems face a capability fragmentation issue, relying heavily on large language models (LLMs) or visual-language models (VLMs) for task decomposition, which neglects human-robot interaction and embodied reasoning capabilities [3][4]. - Modular interaction and planning frameworks exhibit rigidity and lack robustness, making it difficult for robots to adapt to real-time environmental changes [3][4]. Group 2: Proposed Solution - Robix serves as the high-level cognitive hub in hierarchical robotic systems, integrating 3D spatial understanding and visual localization to enhance task planning and human interaction [2][5]. - The model employs a three-stage training strategy: continuous pre-training, supervised fine-tuning, and reinforcement learning, to systematically improve its capabilities [5][13]. Group 3: Key Contributions - Robix introduces a unified high-level cognitive model that integrates reasoning, long-term task planning, and natural language interaction within an end-to-end framework [5][6]. - Extensive experimental validation demonstrates Robix's performance advantages over existing commercial baselines, such as GPT-4o and Gemini 2.5 Pro, across various dimensions [5][24]. Group 4: Architecture and Mechanism - Robix operates at the high-level cognitive layer, processing multimodal reasoning, adaptive task planning, and human-robot interaction, while lower-level controllers execute the generated atomic action commands [7][8]. - The model generates outputs including atomic action commands, natural language responses, and structured reasoning trajectories to guide decision-making [11][12]. Group 5: Training Strategy - The training strategy involves a comprehensive dataset covering 200 billion tokens, focusing on enhancing embodied reasoning, visual localization, and task-centric reasoning [13][14]. - The supervised fine-tuning phase adapts the pre-trained model for high-level cognitive tasks, ensuring diverse human-robot interaction scenarios and high-quality reasoning trajectories [17][18]. Group 6: Performance Evaluation - Robix outperforms existing models in various tasks, including basic embodied reasoning, offline task planning, and online real-world scenarios, showcasing significant accuracy improvements [22][24][27]. - In online evaluations, Robix achieves an average task progress of 92.6%, surpassing Gemini-2.5-Pro and demonstrating lower response latency [29][32]. Group 7: Future Directions - Future efforts will focus on enhancing robustness in dynamic environments and improving long-term memory capabilities to support complex, extended tasks in real-world settings [36][38].
腾讯研究院AI速递 20250908
腾讯研究院· 2025-09-07 16:01
Group 1 - Anthropic has implemented a policy to restrict access to its Claude service for entities with majority ownership by Chinese capital, citing legal, regulatory, and security risks [1] - The restriction also applies to entities from countries considered adversaries, such as Russia, Iran, and North Korea, with expected global revenue impact in the hundreds of millions of dollars [1] Group 2 - AI Key, an external AI assistant hardware for iPhone, sold out within 7 hours of launch, priced at $89, but is seen as redundant given the existing capabilities of iPhones [2] - The trend of AI hardware startups is viewed as short-lived, with future value lying in integrating AI as a system attribute rather than a standalone function [2] Group 3 - Tencent's "Hunyuan Game" platform has launched version 2.0, introducing features like game-to-video generation and custom model training [3] - The new AI capabilities allow users to create high-quality dynamic videos from game images and descriptions, significantly lowering the barrier for custom model training [3] Group 4 - Alibaba has released the Qwen3-Max-Preview model, boasting over a trillion parameters, outperforming competitors in various benchmarks [4] - The model supports over 100 languages and offers a maximum context of 256k, with a tiered pricing model based on token usage [4] Group 5 - ByteDance's Seed team has introduced Robix, a unified "robot brain" that integrates reasoning, task planning, and human-robot interaction [5][6] - Robix employs a hierarchical architecture to separate high-level decision-making from low-level control, enabling dynamic reasoning and execution [6] Group 6 - Rokid's AR+AI glasses sold 40,000 units within 5 days of launch, highlighting their lightweight design and user-friendly features [7] - The product includes customizable audio and translation capabilities, and Rokid has opened its SDK for developers, expanding its global reach [7] Group 7 - Anthropic has agreed to a $1.5 billion settlement in a copyright lawsuit involving the illegal download of 7 million books, marking a significant moment in AI and copyright disputes [8] - The settlement involves compensation for approximately 500,000 books, averaging $3,000 per book, while the financial impact is considered manageable relative to Anthropic's recent funding and revenue [8] Group 8 - The Sensor Tower report indicates that global downloads of generative AI applications reached nearly 1.7 billion in the first half of 2025, with in-app purchase revenue of $1.9 billion, reflecting a 67% quarter-over-quarter growth [10] - The report highlights a demographic shift, with female users of AI assistants exceeding 30%, and emphasizes the competitive pressure on vertical applications [10] Group 9 - OpenAI's recent paper defines "hallucination" in AI models and identifies its root causes, suggesting that current evaluation methods encourage guessing rather than acknowledging uncertainty [11] - The paper proposes a revised evaluation approach that penalizes confident errors more than uncertainty, aiming to improve the reliability of AI responses [11]
字节跳动Seed推出「机器人大脑」Robix:让机器人学会思考、规划与灵活互动
机器之心· 2025-09-07 05:12
近日,字节跳动 Seed 团队发布了最新的机器人研究成果—— Robix ,一个旨在提升机器人思考、规划与灵活交互能力的「机器人大脑」。 根据报告与演示视频,搭载 Robix 的机器人已展现出一系列过去难以实现的复杂交互能力: …… 标题:Robix: A Unified Model for Robot Interaction, Reasoning and Planning ArXiv: https://arxiv.org/abs/2509.01106 项目主页:https://robix-seed.github.io/robix/ 在做饭时,它不仅能根据菜名(如「鱼香肉丝」)准备食材,还能主动发现缺少配料并询问是否需要补齐; 在用户中途改变主意时,它可立即停止当前操作并灵活执行新指令; 在你随手涂鸦时,它能识别出画中的物体,并自然地给予回应与赞赏; 以下演示视频将直观展示 Robix 在真实互动场景中的工作方式。 核心思想: 长期以来,通用机器人在处理复杂、长程任务时,往往因依赖 "模块化" 拼接的设计而显得僵化。Robix 的核心亮点在于其 一体化架构 :将推理、任务规划与人机 交互无缝整合到单个端到端多 ...
字节发了个机器人全能大模型,带队人李航
量子位· 2025-09-06 04:21
Core Viewpoint - Byte's Seed has introduced Robix, a single model that integrates reasoning, task planning, and natural language interaction for robots, eliminating the need for multiple modules [1][4][27]. Group 1: Robix Model Overview - Robix is designed to handle high-level cognitive tasks while a lower-level system (VLA) executes commands issued by Robix [6][9]. - The model is a visual-language integrated single model that processes images and language simultaneously, streamlining communication and decision-making [10][11]. - It employs a chain of thought reasoning and a three-stage training strategy to enhance its capabilities [11][12]. Group 2: Training Methodology - The training consists of three phases: 1. Continuous pre-training with extensive robot-related data to understand 3D space and correlate language with visuals. 2. Supervised fine-tuning using real-world scenarios to teach task handling and basic conversation skills. 3. Reinforcement learning to correct discrepancies between thought and action through a reward system [19][20]. Group 3: Performance Metrics - In foundational ability tests, Robix outperformed Qwen 2.5-VL in 7 out of 8 spatial understanding tasks, achieving higher average accuracy [21]. - Robix's performance in various benchmarks shows it surpassing closed-source models like GPT-4o and Gemini 2.5 Pro in most tests [21][22]. - In real-world interaction tests, Robix-32B achieved an average task progress of 92.5%, exceeding Gemini 2.5 Pro and GPT-4o by 4.3 and 28.1 percentage points, respectively [25]. Group 4: Leadership and Development - The project is led by Dr. Li Hang, who has a significant background in AI and robotics, previously serving as the head of Huawei's Noah's Ark Lab [28][30]. - Despite rumors of retirement, Dr. Li continues to contribute to the project in a consulting capacity [31].