专访王仲远:智源多模态大模型登上《自然》,背后有群年轻人
Xin Jing Bao·2026-02-03 14:17

Core Insights - The Emu3 multimodal model developed by the Beijing Academy of Artificial Intelligence has been published in the prestigious journal Nature, marking a significant achievement for China's research institutions in the field of AI [1][2]. Group 1: Emu3 Model Overview - Emu3 represents a unified architecture that simplifies the understanding and generation of various types of information, including text, images, and videos, by using a single model based on the principle of "predicting the next token" [3][4]. - The model's design allows for significant scalability and lower research and development barriers, enabling more researchers and institutions to engage in cutting-edge exploration [3][4]. Group 2: Technological Advancements - Emu3.5, the subsequent version, has been trained on over 10 trillion tokens, with video training duration increased from 15 years to 790 years, and the parameter count rising from 8 billion to 34 billion [6]. - This version demonstrates the ability to simulate physical world dynamics, marking a transition from "predicting the next word or frame" to "predicting the next state," which is crucial for achieving more general intelligence [6]. Group 3: Team and Innovation - The Emu3 development team is notably young, with the lead developer being only 29 years old, reflecting the institute's philosophy of empowering youth in AI innovation [7][8]. - The team faced significant technical challenges and skepticism from the industry but ultimately succeeded in proving the viability of their innovative approach to multimodal AI [8]. Group 4: Future Applications - Emu3 is positioned as a foundational model for advancing AI from the digital realm to the physical world, enabling applications in robotics and autonomous driving by providing a robust understanding of complex environments [5][10]. - The model is expected to give rise to a new generation of native multimodal assistants capable of creating images and videos based on contextual prompts, enhancing human-computer interaction [5]. Group 5: Talent Development and Institutional Support - The Beijing Academy of Artificial Intelligence emphasizes talent based on impactful work rather than credentials, fostering a dynamic environment for young researchers [9][10]. - The institute operates under a flexible funding model that allows researchers to focus on valuable scientific work without the pressures of traditional corporate structures [9].