小米的首代机器人VLA大模型来了!丝滑赛德芙,推理延迟仅80ms丨全面开源
XIAOMIXIAOMI(HK:01810) 量子位·2026-02-12 12:42

Core Insights - The article discusses the rising prominence of embodied intelligence and robotics, highlighting the increasing interest from both large and small companies, as well as capital investment and media coverage [2][3] - There is a growing expectation for embodied robots to transition from being merely demonstrative to becoming practical tools that enhance productivity in real-world applications [3][4] - Xiaomi's new embodied VLA model, Xiaomi-Robotics-0, addresses critical issues such as the frequent pauses and slow corrections in robotic execution, aiming for greater autonomy and efficiency [7][8] Group 1: Industry Trends - The embodied robotics sector is at a pivotal point, characterized by impressive demonstrations of capabilities while also facing scrutiny regarding their actual value in industrial settings [3][4] - The industry is experiencing a paradigm shift where the focus is on the autonomy of robots, moving beyond human-assisted operations to fully autonomous systems [4][6] Group 2: Xiaomi-Robotics-0 Innovations - Xiaomi-Robotics-0 features three core technological innovations: architecture design, pre-training strategies, and post-training mechanisms, all aimed at enabling robots to understand complex environments and execute actions continuously and accurately [12][13] - The model employs a dual-brain architecture, separating the "brain" for decision-making and the "small brain" for generating continuous action blocks, which enhances the smoothness and precision of robotic movements [16][21] - A two-phase pre-training approach is utilized to maintain the model's visual understanding while training it to perform actions, ensuring that the robot can interpret complex instructions and plan continuous movements [24][30] Group 3: Performance Metrics - Xiaomi-Robotics-0 has achieved outstanding results in various benchmarks, surpassing approximately 30 existing models in environments like LIBERO, CALVIN, and SimplerEnv [44][45] - The model demonstrated a 100% success rate in the Libero-Object task and maintained high throughput in real-world tasks such as towel folding and LEGO disassembly, showcasing its practical capabilities [47][54][57] - The model's performance indicates that it does not sacrifice understanding capabilities for control abilities, maintaining high scores across multiple evaluation metrics [49][58] Group 4: Strategic Direction - Xiaomi's approach in the embodied intelligence field appears to focus on practical applications rather than merely showcasing advanced technology, aiming to address real-world industrial challenges [61][65] - The company has recently open-sourced its models, including TacRefineNet, which enhances fine-grained control without relying on visual input, indicating a commitment to transparency and collaboration within the industry [74][76] - This open-source strategy lowers barriers for smaller developers, allowing them to build upon Xiaomi's foundational work and contribute to the development of specialized applications in robotics [78][79]