多模态世界学习 - filings, earnings calls, financial reports, news

多模态世界学习

Search documents

智源研究院发布2026十大AI技术趋势：“技术泡沫”是假命题

Xin Jing Bao· 2026-01-09 03:52

Core Insights - The Beijing Zhiyuan Artificial Intelligence Research Institute has released its predictions for the top ten AI technology trends for 2026, focusing on foundational models, AI applications, and key industries [1] Group 1: Foundational Models - The institute believes that world models will become a consensus direction for AGI, as high-quality text data is nearly exhausted. AI must learn not only language but also the rules governing the physical world, necessitating the processing of multimodal information such as images, sounds, time, and space [3] - In the realm of embodied intelligence, the number of companies has exceeded 230, but many exhibit homogeneity in their business models, potentially leading to industry "clearing." The introduction of world models may serve as a crucial technological anchor for the next stage of embodied intelligence [3] Group 2: Consumer Applications - The competition in consumer AI applications is becoming clearer, with a focus on "super applications" characterized by "All in One" functionality, moving beyond single-tool attributes to create a closed loop from information acquisition to task planning and problem-solving [3] - Despite the presence of major players in the general market, there are still opportunities for breakthroughs in high-barrier vertical fields such as health and education, where vertical applications demonstrate differentiated competitiveness [3] Group 3: Reasoning Capabilities - The institute asserts that the notion of a "technology bubble" is a false proposition, as reasoning optimization has not yet reached its ceiling. Progress in this area will remain a key factor supporting the large-scale application of AI in 2026 [4]

Artificial Intelligence

具身智能

世界模型

多模态世界学习

Artificial Intelligence

具身智能

世界模型

多模态世界学习

Artificial Intelligence

训练仍有巨大的Scaling空间！智源研究院王仲远：视频数据还未被充分利用 | MEET2026

Xin Lang Cai Jing· 2025-12-24 09:47

Core Insights - The current state of artificial intelligence is at a critical turning point in its third wave, transitioning from weak AI to general AI, and from specialized robots (1.0) to general embodied intelligence (2.0) [1][5][32] - The "Wujie" series of large models, including Emu3.5, aims to anchor AI's transition from the digital world to the physical world [1][5][28] - Emu3.5 is a multimodal world model that learns from video data rather than solely relying on text, addressing the underutilization of video data in AI [1][28][35] Multimodal Learning and Emu3.5 - Emu3.5 utilizes a unified autoregressive architecture to upgrade from Next-Token Prediction to Next-State Prediction, marking a shift from language learning to multimodal world learning [3][12][39] - The training dataset for Emu3.5 has significantly increased from 15 years to 790 years, and its parameter count has risen from 8 billion to 34 billion [38] - Emu3.5's self-developed DiDA technology enhances image generation speed by approximately 20 times, making it competitive with top models [38][39] Open Source and Collaboration - The company has open-sourced over 200 models and more than 100 datasets in the past two years, with global download counts exceeding 690 million and 4 million respectively [3][25][50] - The organization collaborates with over 30 leading robotics companies to promote the development of embodied intelligence world models [25][50] Robo Brain and Embodied Intelligence - The Robo Brain system is designed to address the challenges of usability and generality in embodied AI, enabling cross-robot data collection and standardization [22][47] - The RoboBrain2.0 version can decompose complex human instructions and allocate tasks to different types of robots based on the environment [22][47] - The company has also released RoboBrain-X0, capable of driving various real robots to complete complex tasks under few-shot conditions [23][47]

训练仍有巨大的Scaling空间！智源研究院王仲远：视频数据还未被充分利用 | MEET2026

量子位· 2025-12-24 07:20

Core Viewpoint - The article discusses the transition of artificial intelligence (AI) from the digital world to the physical world, marking a critical turning point in the third wave of AI development, with the introduction of the "Wujie" series of large models by the Zhiyuan Institute [12][13][14]. Group 1: AI Development and Trends - The current AI landscape is at a pivotal moment where large models are facilitating the shift from weak AI to general AI, and from specialized robots (1.0) to general embodied intelligence (2.0) [3][13]. - The "Wujie" series of large models aims to bridge the gap between the digital and physical worlds, representing a significant advancement in AI capabilities [4][14]. - The Emu3.5 model, part of the Wujie series, utilizes a unified autoregressive architecture to transition from Next-Token Prediction to Next-State Prediction, indicating a new phase in multimodal learning [17][22]. Group 2: Emu3.5 Model Features - Emu3.5 distinguishes itself by learning from long videos, which contain rich temporal, spatial, and causal information, essential for understanding the physical world [18][20]. - The training dataset for Emu3.5 has significantly expanded, increasing from 15 years to 790 years of video data, and the model parameters have grown from 8 billion to 34 billion [23]. - Emu3.5's autoregressive architecture allows for rapid image generation, achieving speeds comparable to top models through proprietary DiDA technology [23]. Group 3: Multimodal Learning and Applications - Emu3.5 is expected to lead AI into a new stage of multimodal world learning, with substantial scaling potential due to the underutilization of vast multimodal data [24]. - The model demonstrates strong multimodal reasoning and visual understanding capabilities, as evidenced by its performance in image generation and editing tasks [25][27]. - Emu3.5 excels in tasks involving temporal and spatial state predictions, showcasing its superior understanding of the physical world [29][31]. Group 4: Embodied Intelligence and Technological Advancements - The Zhiyuan Institute is addressing the challenges of embodied intelligence, which currently suffers from usability and generality issues [34]. - The institute has developed a comprehensive technology stack centered around the Robo Brain, enabling cross-robot data collection and standardization [35]. - Recent advancements include the RoboBrain2.0, which can decompose complex human instructions for execution by various robots, enhancing the practical applications of embodied intelligence [36]. Group 5: Open Source Contributions - The Zhiyuan Institute has committed to open-source practices, releasing over 200 models and 100 datasets, with global download figures exceeding 690 million and 4 million, respectively [38]. - The institute collaborates with over 30 leading robotics companies to promote the development of embodied intelligence world models [38].