多模态世界模型
Search documents
深大团队让机器人听懂指令精准导航!成功率可达72.5%,推理效率提升40%|AAAI2026
量子位· 2025-12-10 04:26
Core Insights - The article discusses the introduction of a new framework called UNeMo for visual-language navigation (VLN), developed by a team led by Professor Li Jianqiang from Shenzhen University in collaboration with other institutions [1][4]. Group 1: Framework Overview - UNeMo utilizes a multi-modal world model (MWM) and a hierarchical predictive feedback navigator (HPFN) to enhance navigation capabilities by allowing agents to predict future visual states and make informed decisions [3][11]. - The framework addresses the disconnection between language reasoning and visual navigation, which has been a challenge in existing methods [8][9]. Group 2: Performance Metrics - UNeMo demonstrates a navigation success rate of 72.5% in unseen environments, outperforming the previous method NavGPT2, which had a success rate of 71% [4][26]. - The model's resource efficiency is notable, with GPU memory usage reduced by 56% from 27GB to 12GB and an improvement in inference speed by 40% [24]. Group 3: Robustness in Complex Scenarios - UNeMo shows significant advantages in long-path navigation, with a success rate increase of 5.6% for paths longer than 7 units, compared to a minor increase of 1.2% for shorter paths [28][29]. - This improvement indicates that UNeMo effectively mitigates cumulative errors in long-distance navigation tasks [30]. Group 4: Scalability and Adaptability - The framework has been tested across various navigation baselines and datasets, demonstrating its adaptability and scalability beyond LLM-based systems [31][33]. - UNeMo's collaborative training architecture allows it to perform well in diverse task scenarios, enhancing its overall value [34].
奥特曼否认OpenAI明年上市;中国移动0元划转4198万股
2 1 Shi Ji Jing Ji Bao Dao· 2025-11-04 03:27
Group 1: OpenAI Developments - OpenAI CEO Altman denied rumors of the company going public next year, stating that there is no specific date or decision from the board regarding an IPO, but he believes it will eventually happen [2] - OpenAI's annual revenue significantly exceeds the rumored $13 billion [2] - OpenAI signed a $38 billion computing power procurement agreement with Amazon Web Services (AWS), marking its first collaboration with a global cloud infrastructure leader outside of Microsoft [5] Group 2: Corporate Actions and Financial Moves - China Mobile announced a non-cash transfer of 41.98 million shares to China National Petroleum Corporation, reducing its stake from 69.05% to 68.85% [3] - Boeing completed the sale of part of its digital aviation solutions business for $10.55 billion to Thoma Bravo, optimizing its capital structure and allowing a focus on core business [8] - Wuhan Weinan Battery Asset Co., Ltd. completed a C-round financing of 670 million yuan, with participation from NIO and CATL, to support battery asset-related business and technology development [12] Group 3: Technology and Innovation - Microsoft CEO Nadella indicated the company may restart hiring in the next year, contingent on existing employees learning to collaborate with AI [4] - Xiaopeng Motors' CEO He Xiaopeng announced plans to mass-produce robots by 2026, emphasizing the importance of integration and overcoming challenges in cost, safety, and consistency [6] - The Zhiyuan Research Institute released the Emu3.5 multimodal world model, significantly enhancing training data and inference speed, marking a new era in multimodal AI [13] Group 4: Market Trends and Strategic Moves - Elon Musk announced the upcoming launch of a new encrypted communication platform, XChat, which will integrate with the existing X social platform [7] - Qualcomm and MediaTek are accelerating their adoption of TSMC's N2P process technology to compete with Apple in chip production [11] - Tesla's AI team is progressing on the AI 5 chip for smart assisted driving, with future versions AI 6 and AI 7 expected to follow [10]
智源研究院发布“悟界”系列大模型:让AI看见并理解物理世界
Jing Ji Guan Cha Wang· 2025-06-07 02:55
Core Insights - The Beijing Zhiyuan Conference showcased the latest developments in AI, including the release of the "Wujie" series of models by the Zhiyuan Research Institute, which aims to advance AI's understanding of the physical world [2][4] - The director of Zhiyuan, Wang Zhongyuan, emphasized that the next phase of AI development requires moving beyond language models to multi-modal world models that can perceive and interact with the physical environment [4][5] Model Releases - The "Wujie" series includes four models: Emu3, Brainμ, RoboOS 2.0, and RoboBrain 2.0, each designed to enhance AI's capabilities in understanding and interacting with the physical world [2][3] - Emu3 utilizes a new visual tokenizer technology to unify the representation of text, images, and videos, allowing AI to process them in a cohesive manner [3] - Brainμ aims to serve as a new engine for neuroscience research and clinical applications, integrating over one million neural signal data units [3] - RoboOS 2.0 improves performance by 30% compared to its predecessor, enabling faster integration of developer plugins and enhancing real-time response capabilities [3] - OpenComplex2 targets life sciences by simulating molecular movements at atomic resolution, potentially accelerating drug development and biological research [3] Strategic Partnerships and Goals - Zhiyuan has signed a strategic cooperation agreement with Hong Kong Investment Management Company to foster talent, technology, and capital collaboration [6] - The organization is committed to open-source and international collaboration, having already open-sourced 200 models with a total of 640 million downloads [7] - Wang Zhongyuan highlighted the importance of patience and sustained capital investment for long-term goals, despite short-term commercialization challenges [5][6]