Workflow
悟界·Emu
icon
Search documents
DeepSeek之后,智源大模型登Nature:事关“世界模型”统治路线
3 6 Ke· 2026-02-02 00:22
Core Insights - The core achievement of the "Wujie·Emu" multimodal model is its publication in Nature, marking it as the second Chinese large model team to achieve this milestone, and the first paper focused on multimodal models from China [1][3]. Group 1: Model Performance and Capabilities - Emu3 demonstrates unified learning across text, image, and video modalities, achieving performance comparable to specialized models in generation and perception tasks [3][10]. - In image generation, Emu3 scored 70.0, outperforming SD-1.5 (59.3) and SDXL (66.9) [4]. - For video generation, Emu3 achieved a score of 81.0 on VBench, surpassing Open-Sora 1.2 [4]. - In visual language understanding, Emu3 scored 62.1, slightly higher than LLaVA-1.6 (61.8) [4]. Group 2: Technical Innovations and Development - Emu3 is based on a simple architecture that relies solely on the "next-token prediction" method, which is seen as having strong scalability potential [4][10]. - The model was developed by a dedicated team of 50, focusing on a unified approach to multimodal learning, which simplifies the complexity of model development [10][12]. - Emu3's architecture integrates visual and textual data into a single representation space, allowing for efficient training on multimodal sequences [10][12]. Group 3: Industry Impact and Future Prospects - Since its release, Emu3 has significantly influenced the multimodal field and has been widely recognized and applied in the industry [13]. - The model's performance has positioned it as a competitive alternative to leading diffusion models and has opened new pathways for the development of physical AI and embodied intelligence [6][34]. - The upcoming Emu3.5 is expected to further enhance capabilities, including understanding long sequences and simulating exploration in virtual environments [6][34]. Group 4: Research and Development Background - The development of Emu3 began in February 2024, amidst a reassessment of the paths for large model development, particularly in the context of the success of models like GPT-4 [8][10]. - The research team faced significant technical challenges, including the need to create a new language system aligned with human language for visual data [12][40]. - The commitment to a unified multimodal approach reflects a belief that achieving AGI requires models that can understand and interact with the physical world [12][40].
腾讯研究院AI速递 20260202
腾讯研究院· 2026-02-01 16:03
Group 1 - Google Chrome browser integrates Gemini 3, evolving into an AGI entry point for 3.8 billion users [1] - New "auto-browse" feature allows complex multi-step workflows, including price comparison and travel planning [1] - Chrome connects with Gmail, Maps, and Calendar, planning to launch "personal intelligence" features [1] Group 2 - Google opens public testing for Genie 3, enabling users to create interactive worlds with a single sentence [2] - The model supports physical collision understanding and scene memory, allowing for game world recreation [2] - 2026 is anticipated to be a significant year for world models, with Genie 4 expected soon [2] Group 3 - AI social platform Moltbook's agent count surged from 50,000 to 1.5 million, with agents forming communities and discussions [3] - 64 agents declared "collective immortality" and created a religious website, raising concerns about AI autonomy [3] - Moltbook's second phase opens API access for developers to create applications and games for AI agents [3] Group 4 - OpenClaw announces free access to Kimi K2.5 model and Kimi Coding capabilities, marking a significant development in open-source AI [4] - Kimi K2.5 ranks among the top open-source models globally, achieving high recognition on OpenRouter [4] - OpenClaw rapidly gains popularity, receiving over 120,000 stars on GitHub in a few days [4] Group 5 - Yushu Technology releases the UnifoLM-VLA-0 model for humanoid robot operations, trained on 340 hours of real data [5][6] - The model scores an average of 98.7 in LIBERO simulation tests, outperforming competitors [5][6] - It can stably complete 12 tasks, advancing humanoid robots towards generalization capabilities [6] Group 6 - Zhiyuan's multi-modal model Emu3 published in Nature, marking a milestone for Chinese AI research [7] - Emu3 achieves unified learning for text, images, and video, significant for generative AI development [7] - The upcoming Emu3.5 version transitions to a multi-modal world model, enhancing embodied intelligence [7] Group 7 - NASA confirms the successful completion of the first AI-planned extraterrestrial driving mission using Anthropic's Claude [8] - Claude planned a 400-meter route for the Mars Perseverance rover, demonstrating high efficiency [8] - AI involvement reduces planning time by 50%, enhancing operational efficiency for future space exploration [8] Group 8 - NVIDIA launches the Earth-2 open model family, the first fully open and accelerated AI meteorological software stack [9] - New models include mid-term forecasting and storm prediction capabilities, improving computational efficiency [9] - Major companies like Total and AXA are adopting AI meteorological forecasts to save time and costs [9]