训练仍有巨大的Scaling空间!智源研究院王仲远:视频数据还未被充分利用 | MEET2026
Xin Lang Cai Jing·2025-12-24 09:47

Core Insights - The current state of artificial intelligence is at a critical turning point in its third wave, transitioning from weak AI to general AI, and from specialized robots (1.0) to general embodied intelligence (2.0) [1][5][32] - The "Wujie" series of large models, including Emu3.5, aims to anchor AI's transition from the digital world to the physical world [1][5][28] - Emu3.5 is a multimodal world model that learns from video data rather than solely relying on text, addressing the underutilization of video data in AI [1][28][35] Multimodal Learning and Emu3.5 - Emu3.5 utilizes a unified autoregressive architecture to upgrade from Next-Token Prediction to Next-State Prediction, marking a shift from language learning to multimodal world learning [3][12][39] - The training dataset for Emu3.5 has significantly increased from 15 years to 790 years, and its parameter count has risen from 8 billion to 34 billion [38] - Emu3.5's self-developed DiDA technology enhances image generation speed by approximately 20 times, making it competitive with top models [38][39] Open Source and Collaboration - The company has open-sourced over 200 models and more than 100 datasets in the past two years, with global download counts exceeding 690 million and 4 million respectively [3][25][50] - The organization collaborates with over 30 leading robotics companies to promote the development of embodied intelligence world models [25][50] Robo Brain and Embodied Intelligence - The Robo Brain system is designed to address the challenges of usability and generality in embodied AI, enabling cross-robot data collection and standardization [22][47] - The RoboBrain2.0 version can decompose complex human instructions and allocate tasks to different types of robots based on the environment [22][47] - The company has also released RoboBrain-X0, capable of driving various real robots to complete complex tasks under few-shot conditions [23][47]