Emu3.5
Search documents
专访王仲远:智源多模态大模型登上《自然》,背后有群年轻人
Xin Jing Bao· 2026-02-03 14:17
Core Insights - The Emu3 multimodal model developed by the Beijing Academy of Artificial Intelligence has been published in the prestigious journal Nature, marking a significant achievement for China's research institutions in the field of AI [1][2]. Group 1: Emu3 Model Overview - Emu3 represents a unified architecture that simplifies the understanding and generation of various types of information, including text, images, and videos, by using a single model based on the principle of "predicting the next token" [3][4]. - The model's design allows for significant scalability and lower research and development barriers, enabling more researchers and institutions to engage in cutting-edge exploration [3][4]. Group 2: Technological Advancements - Emu3.5, the subsequent version, has been trained on over 10 trillion tokens, with video training duration increased from 15 years to 790 years, and the parameter count rising from 8 billion to 34 billion [6]. - This version demonstrates the ability to simulate physical world dynamics, marking a transition from "predicting the next word or frame" to "predicting the next state," which is crucial for achieving more general intelligence [6]. Group 3: Team and Innovation - The Emu3 development team is notably young, with the lead developer being only 29 years old, reflecting the institute's philosophy of empowering youth in AI innovation [7][8]. - The team faced significant technical challenges and skepticism from the industry but ultimately succeeded in proving the viability of their innovative approach to multimodal AI [8]. Group 4: Future Applications - Emu3 is positioned as a foundational model for advancing AI from the digital realm to the physical world, enabling applications in robotics and autonomous driving by providing a robust understanding of complex environments [5][10]. - The model is expected to give rise to a new generation of native multimodal assistants capable of creating images and videos based on contextual prompts, enhancing human-computer interaction [5]. Group 5: Talent Development and Institutional Support - The Beijing Academy of Artificial Intelligence emphasizes talent based on impactful work rather than credentials, fostering a dynamic environment for young researchers [9][10]. - The institute operates under a flexible funding model that allows researchers to focus on valuable scientific work without the pressures of traditional corporate structures [9].
智源多模态大模型Emu3首登《自然》
Ke Ji Ri Bao· 2026-02-02 05:23
1月28日,北京智源研究院主导的多模态大模型成果"Emu3"正式上线国际顶级学术期刊《自然》正刊(纸质版预计将于2月12日刊发),这是我国 科研机构主导的大模型成果首次登陆该期刊,标志着我国在人工智能原始创新领域取得重大突破。 此前,语言大模型依托"预测下一个词元(NTP)"的自回归路线实现重大突破,但多模态模型仍依赖对比学习、扩散模型等专用路线,自回归能 否成为多模态通用路线一直是行业未解之谜。智源团队提出的Emu3模型,将文本、图像、视频统一离散化到同一表示空间,基于单一Transformer 架构从零开始联合训练,仅凭"预测下一个词元"就实现了多模态生成与感知的统一。 实验显示,Emu3在文生图任务中性能比肩扩散模型,视觉语言理解能力媲美CLIP与大语言模型融合方案,还能以纯自回归方式生成高保真视 频,支持视频延展、图文交错生成及机器人操作建模等多元任务。《自然》编辑点评称,该成果对构建可扩展、统一的多模态智能系统具有重要 意义。 值得关注的是,该团队通过大规模消融实验验证了多模态学习的规模定律,证实直接偏好优化(DPO)可无缝适配自回归视觉生成。后续迭代的 Emu3.5更实现"预测下一个状态"的能力跃 ...
登上Nature!智源研究院推出AI全能选手——Emu3,一统多模态学习
生物世界· 2026-01-31 03:05
Core Viewpoint - The article discusses the introduction of Emu3, a multimodal large model developed by Beijing Academy of Artificial Intelligence, which aims to unify the learning of text, images, and videos through next-token prediction, potentially transforming the AI landscape [2][3]. Multimodal Learning - Multimodal learning refers to the ability of AI to process various types of information simultaneously, akin to human sensory perception. Achieving a unified algorithm for learning and generating content from multiple modalities has been a long-standing challenge in the AI field [6]. Emu3's Mechanism - Emu3 employs a simple yet effective approach by converting all modal data into discrete tokens and using a Transformer model to predict the next token, which is a key factor in the success of GPT series language models [6][7]. Training Process - The training of Emu3 consists of three stages: 1. Pre-training with large-scale multimodal data, balancing the loss weights of text and visual tokens to prevent dominance of visual tokens [10]. 2. Post-training for quality fine-tuning on generation tasks, incorporating human preference optimization [10]. 3. Inference supporting classifier-free guidance for low-latency and high-throughput generation [11]. Performance Comparison - Emu3 has demonstrated performance that matches or exceeds specialized models across various tasks: - In image generation, it achieved a human preference score of 70.0, surpassing Stable Diffusion v1.5 (59.3) and SDXL (66.9) [13]. - In video generation, it scored 81.0 in VBench evaluation, comparable to mainstream diffusion models [13]. - In visual language understanding, it averaged 62.1 across 12 benchmark tests, rivaling models like LLaVA-1.6 [13]. - In robotic operations, it achieved a success rate of 87.0% in a simulated environment [13]. Significance of the Research - The significance of Emu3 lies not only in its performance improvements but also in its simplification of paradigms. It demonstrates that next-token prediction can serve as a core paradigm for multimodal models, paving the way for the development of more powerful "world models" that integrate perception, language, and action [15][17]. Future Developments - Following Emu3, the research team has introduced Emu3.5, which enhances the model's capabilities through large-scale long-sequence video training, improving its ability to model physical world dynamics and observing trends in multimodal capabilities as the model and data scale increase [15].
AI应用下一个突破口在哪
Bei Jing Shang Bao· 2025-12-10 15:44
Core Insights - The report indicates that AI is transitioning from the "tool era" to the "partner era" by 2025, with a clearer development trend for 2026 [1] Infrastructure - The report emphasizes the importance of computing power and chips, identifying the computing economy as the primary engine of the intelligent industry, with unprecedented global demand for AI computing power driving the construction of large-scale data centers [3] - These data centers are evolving from traditional server hosting to AI company-led powerhouses integrating massive computing, storage, and network resources [3] - Cloud computing vendors are shifting investments from general computing resources to dedicated computing infrastructure that meets AI demands, leading to strategic partnerships with AI companies [3] - The rise of AI-native demands is reshaping chip innovation, with GPU dominance being challenged by NPU and growth in ASIC/FPGA technologies [3] - China is accelerating the construction of a self-controlled computing ecosystem, with domestic "chip + SDK + framework" solutions validated in trillion-level model training [3] Model Innovation - Pre-training determines the hierarchy of large models, while architectural innovation influences pre-training levels, with hybrid expert models becoming mainstream under computational constraints [4] - In 2025, large models will enter the "inference time," with breakthroughs in multi-modal deep inference and adaptive reasoning [4] - The report highlights a surge in research and development in physical AI and embodied intelligence, with world models and VLA frameworks becoming focal points [4] - A new order collaboration was announced by UBTECH, involving humanoid robot sales exceeding 50 million yuan, showcasing integration with AI large models [4] Application Landscape - AI is reshaping traffic entry points, transitioning from "people finding services" to "services finding people," marking a new interaction paradigm [7] - AI agents are developing closed-loop capabilities in perception, planning, decision-making, and execution, gradually replacing traditional apps [7] - The new generation of AI systems can process and understand multiple information types simultaneously, enhancing performance in complex scenarios and opening new possibilities for creative content generation and intelligent interaction [7] - The report predicts that as technology matures in the next 2-3 years, AI will become a standard tool across various industries, transitioning from a competitive advantage to a necessity [7] - The AI hardware sector is also gaining attention, with lightweight models and edge computing technologies driving AI capabilities to mobile devices, cars, and IoT devices [7] - More smart devices are gaining local AI processing capabilities, addressing data privacy, network latency, and cost efficiency issues [7]
100亿都不够烧!机器人公司CEO们给出新判断:具身智能不能再照搬LLM
Sou Hu Cai Jing· 2025-11-22 02:41
Core Insights - The event highlighted the latest advancements in embodied intelligence by the Zhiyuan Research Institute, focusing on the importance of world models and the development of a comprehensive embodied brain system [2][3] Group 1: Zhiyuan's Full-Stack Layout - Zhiyuan introduced the native multimodal world model Emu3.5, which expanded training data from 15 years of video to 790 years and increased parameter size from 8 billion to 34 billion, enhancing video and image generation speed [5] - The institute is constructing a cross-heterogeneous ontology embodied intelligence system, including RoboBrain, RoboOS, and RoboBrain-0, deployed across various robotic forms for tasks ranging from navigation to complex interactions [5] Group 2: Key Elements of Embodied Intelligence - The role of world models in embodied intelligence was debated, with experts emphasizing the need for models that predict the next state based on the robot's form and goals, rather than merely generating videos [7][10] - There is a consensus that embodied intelligence should not follow the current language-first paradigm but rather adopt a structure centered on action and perception [10][12] - The importance of real data was highlighted, with discussions on the necessity of combining real, simulated, and video data for effective learning in robots [15][17] Group 3: Investment Priorities - When asked how to allocate 10 billion, experts prioritized talent acquisition, computational power, and data engines as key investment areas [19][21] - There were differing views on the importance of infrastructure versus model development, with some advocating for a focus on creating a comprehensive data engine for continuous digitalization [21][22] Group 4: Human-like Robots and Hardware Limitations - The debate on whether human-like robots represent the ultimate form of embodied intelligence concluded that neither models nor hardware define each other; rather, the specific application scenarios dictate the requirements [22][24] - Experts suggested that a layered structure for embodied intelligence should be adopted, where higher-level models can be reused across different robotic forms, but lower-level models must be tailored to specific hardware [23][24] Conclusion - The discussions at the event signaled a proactive search for solutions to achieve a closed-loop system in embodied intelligence, emphasizing the need for models, hardware, and scaling to evolve together [24]
奥特曼否认OpenAI明年上市;中国移动0元划转4198万股
2 1 Shi Ji Jing Ji Bao Dao· 2025-11-04 03:27
Group 1: OpenAI Developments - OpenAI CEO Altman denied rumors of the company going public next year, stating that there is no specific date or decision from the board regarding an IPO, but he believes it will eventually happen [2] - OpenAI's annual revenue significantly exceeds the rumored $13 billion [2] - OpenAI signed a $38 billion computing power procurement agreement with Amazon Web Services (AWS), marking its first collaboration with a global cloud infrastructure leader outside of Microsoft [5] Group 2: Corporate Actions and Financial Moves - China Mobile announced a non-cash transfer of 41.98 million shares to China National Petroleum Corporation, reducing its stake from 69.05% to 68.85% [3] - Boeing completed the sale of part of its digital aviation solutions business for $10.55 billion to Thoma Bravo, optimizing its capital structure and allowing a focus on core business [8] - Wuhan Weinan Battery Asset Co., Ltd. completed a C-round financing of 670 million yuan, with participation from NIO and CATL, to support battery asset-related business and technology development [12] Group 3: Technology and Innovation - Microsoft CEO Nadella indicated the company may restart hiring in the next year, contingent on existing employees learning to collaborate with AI [4] - Xiaopeng Motors' CEO He Xiaopeng announced plans to mass-produce robots by 2026, emphasizing the importance of integration and overcoming challenges in cost, safety, and consistency [6] - The Zhiyuan Research Institute released the Emu3.5 multimodal world model, significantly enhancing training data and inference speed, marking a new era in multimodal AI [13] Group 4: Market Trends and Strategic Moves - Elon Musk announced the upcoming launch of a new encrypted communication platform, XChat, which will integrate with the existing X social platform [7] - Qualcomm and MediaTek are accelerating their adoption of TSMC's N2P process technology to compete with Apple in chip production [11] - Tesla's AI team is progressing on the AI 5 chip for smart assisted driving, with future versions AI 6 and AI 7 expected to follow [10]
AI伪造黄仁勋直播,观看人数超英伟达官方5倍;OpenAI计划2027年上市,估值或高达一万亿美元|一周AI要闻汇总
36氪· 2025-11-01 09:45
Group 1 - Adobe launched its advanced image generation and editing model Firefly Image 5, supporting 4 million pixel native output and introducing new generative AI tools for applications like Photoshop and Premiere Pro [2][3] - Zhiyuan Research Institute released the Emu3.5 multimodal model, trained on over 10 trillion tokens, with video training duration increasing from 15 years to 790 years and parameter count rising from 8 billion to 34 billion [2] Group 2 - Figma acquired AI generation company Weavy to create a new "node-based" AI design paradigm, enhancing creative control for designers [6] - OpenAI plans to go public in 2027 with a potential valuation of $1 trillion, expecting revenue to double this year to $12.7 billion and continue growing rapidly [6][9] - YouTube is undergoing restructuring focused on AI applications, offering voluntary buyout options to employees considering leaving [7] Group 3 - Google Labs introduced Pomelli, an AI marketing tool designed to help small businesses quickly create social media campaigns by extracting brand information from their websites [4] - Synthesia, a UK-based AI video generation unicorn, completed a $200 million funding round, achieving a valuation of $4 billion and serving around 60,000 enterprises [9] - Ant Group's AI health application AQ ranked 7th in China's AI native application list, with a compound growth rate of 83.4%, significantly outpacing the industry average of 13.5% [8]
90后数学家王虹拿下超级大奖;陈天桥将投10亿美元算力支持发现式智能;泡泡玛特中东首店开业;OpenAI回应筹备IPO丨邦早报
创业邦· 2025-10-31 00:08
Group 1 - The 2025 Hurun Women Entrepreneurs List was released, with Zhong Huijuan from Hansoh Pharmaceutical becoming China's richest woman for the first time, with a wealth of 141 billion yuan [1] - Young mathematician Wang Hong from Guangxi won the 2025 Salem Prize, which is considered a precursor to the Fields Medal, and was also awarded at the World Chinese Mathematicians Conference [1] - OpenAI is reportedly preparing for an IPO, with a potential valuation of up to $1 trillion, which could be one of the largest IPOs in history [2] Group 2 - Li Cao from Leap Motor clarified that the company focuses on self-research of core technologies and respects Huawei as a benchmark for China's technological independence [2] - Xiaomi's "Giant Energy Saving" series was clarified by executives as a product line name rather than a performance metric, with energy efficiency exceeding national standards [4] - JD.com launched a promotional campaign offering free food delivery as part of its 11.11 shopping festival, with a total of 1 million free orders available [6] Group 3 - JD.com founder Liu Qiangdong treated 150,000 full-time delivery riders to KFC as a reward for their hard work during the 11.11 sales event [8] - Chen Tianqiao announced a $1 billion investment in computing power to support innovative AI research, emphasizing the importance of discovery in AI [8] - Giant Network responded to the departure of its former CEO, stating that the company is focused on reducing internal conflicts and improving decision-making efficiency [10] Group 4 - Didi announced a freight payment guarantee, committing to fully cover drivers' unpaid earnings if not received within seven days after order completion [10] - Pop Mart opened its first store in the Middle East, which operates 24 hours a day, marking a significant expansion for the brand [10] - Taobao is set to launch a "Taobao Convenience Store" project, offering a wide range of products online with a focus on quality and service standards [13] Group 5 - The skincare brand "LAN" responded to consumer concerns about compliance with regulatory standards, stating that their product registrations are valid [13] - Apple CEO Tim Cook avoided questions regarding iPhone Air production cuts during a recent earnings call, maintaining the company's policy of not disclosing specific model sales [13] - The NBA approved Mark Walter as the new owner of the Los Angeles Lakers, with a total valuation of $10 billion for the team [14] Group 6 - Ford announced an additional investment of $170 million in Argentina for the production of hybrid Ranger vehicles, set to begin in 2027 [14] - Wikipedia subtly criticized Elon Musk's AI-driven encyclopedia GrokiPedia, emphasizing its human-operated nature in a fundraising announcement [14] - Tesla is recalling 6,197 Cybertruck vehicles in the U.S. due to potential issues with the installation of off-road light bars [17] Group 7 - YouTube is undergoing a restructuring focused on AI applications, offering voluntary buyout options to employees considering leaving the company [17] - Volkswagen reported a net loss of €1.072 billion in Q3 2025, with a significant decline in profits attributed to increased electric vehicle production and additional costs [18] - Nvidia plans to invest up to $1 billion in AI startup Poolside, potentially increasing its valuation significantly [18] Group 8 - Intel is in preliminary talks to acquire AI chip startup SambaNova Systems, with potential valuation lower than its previous funding round [18] - Shunwei Capital led a multi-million yuan angel round investment in Zhefei Aviation Technology, indicating continued interest in the aviation sector [18] - Pyromind Dynamics completed a $10 million seed round financing to expand its team and product development in the reinforcement learning sector [18]