智源研究院院长王仲远：多模态大模型会给具身智能带来新变量

Core Insights - The topic of embodied intelligence is a major focus at the 2025 Zhongguancun Forum, with the introduction of the RoboOS framework and the open-source RoboBrain model [1][3] - Multi-modal large model technology is expected to enhance the intelligence of robots, allowing them to better understand and interact with the physical world [2][3] Group 1: Multi-modal Large Models - Multi-modal large models enable AI to perceive and understand the world through various data types, such as medical imaging and sensor data, facilitating the transition from digital to physical environments [2] - The performance improvement of large language models has slowed due to the exhaustion of available internet text data, necessitating the integration of multi-modal capabilities [2] Group 2: RoboBrain and RoboOS - RoboBrain and RoboOS are designed to support cross-scenario, multi-task deployment and collaboration among different types of robots, enhancing their general intelligence [3] - RoboBrain can interpret human commands and visual inputs to generate actionable plans based on real-time feedback, supporting various robotic configurations [3] Group 3: Industry Development and Challenges - The open-source approach is seen as a key driver for rapid development in the AI industry, allowing for collaboration among hardware, model, and application vendors [4] - Despite the potential of humanoid robots, there are significant challenges in their industrial application, with many still in the early stages of development [5] - The realization of Artificial General Intelligence (AGI) is projected to take an additional 5-10 years, influenced by advancements in embodiment capabilities and data accumulation [5]