多模态世界模型
Search documents
深大团队让机器人听懂指令精准导航!成功率可达72.5%,推理效率提升40%|AAAI2026
量子位· 2025-12-10 04:26
UNeMo团队 投稿 量子位 | 公众号 QbitAI 让机器人听懂指令,精准导航再升级! 深圳大学李坚强教授团队最近联合北京理工莫斯科大学等机构,提出视觉-语言导航 (VLN) 新框架—— UNeMo 。 通过 多模态世界模型 与 分层预测反馈机制 ,能够让导航智能体不仅可以看到当前环境,还能预测接下来可能看到的内容,并据此做出更聪 明的决策。 相比主流方法,UNeMo可大幅度降低资源消耗,在未见过的环境中导航成功率可达72.5%,尤其是在 长轨迹导航 中表现突出。 目前,该论文已入选AAAI2026。 以下是更多详细内容。 语言推理与视觉导航的"脱节困境" 作为Embodied AI的核心任务之一,视觉-语言导航要求智能体仅凭 视觉图像 和 自然语言 指令,在未知环境中自主完成目标导航。 而随着大语言模型 (LLM) 的兴起,基于LLM的导航方法虽取得进展,但仍面临两大关键瓶颈: 优化目标冲突:推理模块与导航策略分开训练,导致两者适配性差,无法实现动态协同优化,存在性能瓶颈。 双模块协同打造"预判+决策"闭环 于是研究团队提出了UNeMo框架,其核心突破在于构建了 "多模态世界模型 (MWM) +分层预测 ...
奥特曼否认OpenAI明年上市;中国移动0元划转4198万股
2 1 Shi Ji Jing Ji Bao Dao· 2025-11-04 03:27
Group 1: OpenAI Developments - OpenAI CEO Altman denied rumors of the company going public next year, stating that there is no specific date or decision from the board regarding an IPO, but he believes it will eventually happen [2] - OpenAI's annual revenue significantly exceeds the rumored $13 billion [2] - OpenAI signed a $38 billion computing power procurement agreement with Amazon Web Services (AWS), marking its first collaboration with a global cloud infrastructure leader outside of Microsoft [5] Group 2: Corporate Actions and Financial Moves - China Mobile announced a non-cash transfer of 41.98 million shares to China National Petroleum Corporation, reducing its stake from 69.05% to 68.85% [3] - Boeing completed the sale of part of its digital aviation solutions business for $10.55 billion to Thoma Bravo, optimizing its capital structure and allowing a focus on core business [8] - Wuhan Weinan Battery Asset Co., Ltd. completed a C-round financing of 670 million yuan, with participation from NIO and CATL, to support battery asset-related business and technology development [12] Group 3: Technology and Innovation - Microsoft CEO Nadella indicated the company may restart hiring in the next year, contingent on existing employees learning to collaborate with AI [4] - Xiaopeng Motors' CEO He Xiaopeng announced plans to mass-produce robots by 2026, emphasizing the importance of integration and overcoming challenges in cost, safety, and consistency [6] - The Zhiyuan Research Institute released the Emu3.5 multimodal world model, significantly enhancing training data and inference speed, marking a new era in multimodal AI [13] Group 4: Market Trends and Strategic Moves - Elon Musk announced the upcoming launch of a new encrypted communication platform, XChat, which will integrate with the existing X social platform [7] - Qualcomm and MediaTek are accelerating their adoption of TSMC's N2P process technology to compete with Apple in chip production [11] - Tesla's AI team is progressing on the AI 5 chip for smart assisted driving, with future versions AI 6 and AI 7 expected to follow [10]
智源研究院发布“悟界”系列大模型:让AI看见并理解物理世界
Jing Ji Guan Cha Wang· 2025-06-07 02:55
Core Insights - The Beijing Zhiyuan Conference showcased the latest developments in AI, including the release of the "Wujie" series of models by the Zhiyuan Research Institute, which aims to advance AI's understanding of the physical world [2][4] - The director of Zhiyuan, Wang Zhongyuan, emphasized that the next phase of AI development requires moving beyond language models to multi-modal world models that can perceive and interact with the physical environment [4][5] Model Releases - The "Wujie" series includes four models: Emu3, Brainμ, RoboOS 2.0, and RoboBrain 2.0, each designed to enhance AI's capabilities in understanding and interacting with the physical world [2][3] - Emu3 utilizes a new visual tokenizer technology to unify the representation of text, images, and videos, allowing AI to process them in a cohesive manner [3] - Brainμ aims to serve as a new engine for neuroscience research and clinical applications, integrating over one million neural signal data units [3] - RoboOS 2.0 improves performance by 30% compared to its predecessor, enabling faster integration of developer plugins and enhancing real-time response capabilities [3] - OpenComplex2 targets life sciences by simulating molecular movements at atomic resolution, potentially accelerating drug development and biological research [3] Strategic Partnerships and Goals - Zhiyuan has signed a strategic cooperation agreement with Hong Kong Investment Management Company to foster talent, technology, and capital collaboration [6] - The organization is committed to open-source and international collaboration, having already open-sourced 200 models with a total of 640 million downloads [7] - Wang Zhongyuan highlighted the importance of patience and sustained capital investment for long-term goals, despite short-term commercialization challenges [5][6]