Workflow
分层预测反馈机制
icon
Search documents
深大团队让机器人听懂指令精准导航!成功率可达72.5%,推理效率提升40%|AAAI2026
量子位· 2025-12-10 04:26
Core Insights - The article discusses the introduction of a new framework called UNeMo for visual-language navigation (VLN), developed by a team led by Professor Li Jianqiang from Shenzhen University in collaboration with other institutions [1][4]. Group 1: Framework Overview - UNeMo utilizes a multi-modal world model (MWM) and a hierarchical predictive feedback navigator (HPFN) to enhance navigation capabilities by allowing agents to predict future visual states and make informed decisions [3][11]. - The framework addresses the disconnection between language reasoning and visual navigation, which has been a challenge in existing methods [8][9]. Group 2: Performance Metrics - UNeMo demonstrates a navigation success rate of 72.5% in unseen environments, outperforming the previous method NavGPT2, which had a success rate of 71% [4][26]. - The model's resource efficiency is notable, with GPU memory usage reduced by 56% from 27GB to 12GB and an improvement in inference speed by 40% [24]. Group 3: Robustness in Complex Scenarios - UNeMo shows significant advantages in long-path navigation, with a success rate increase of 5.6% for paths longer than 7 units, compared to a minor increase of 1.2% for shorter paths [28][29]. - This improvement indicates that UNeMo effectively mitigates cumulative errors in long-distance navigation tasks [30]. Group 4: Scalability and Adaptability - The framework has been tested across various navigation baselines and datasets, demonstrating its adaptability and scalability beyond LLM-based systems [31][33]. - UNeMo's collaborative training architecture allows it to perform well in diverse task scenarios, enhancing its overall value [34].