DyNaVLM

Search documents
上海交大最新!DyNaVLM:零样本、端到端导航框架
具身智能之心· 2025-06-22 10:56
Core Viewpoint - The article discusses the development of DyNaVLM, a zero-shot, end-to-end navigation framework that integrates vision-language models (VLM) to enhance navigation capabilities in dynamic environments, overcoming limitations of traditional methods [4][5]. Group 1: Introduction and Optimization Goals - Navigation is a fundamental capability in autonomous agents, requiring spatial reasoning, real-time decision-making, and adaptability to dynamic environments. Traditional methods face challenges in generalization and scalability due to their modular design [4]. - The advancement of VLMs offers new possibilities for navigation by integrating perception and reasoning within a single framework, although their application in embodied navigation is limited by spatial granularity and contextual reasoning capabilities [4]. Group 2: Core Innovations of DyNaVLM - **Dynamic Action Space Construction**: DyNaVLM introduces a dynamic action space that allows robots to determine navigation goals based on visual information and language instructions, enhancing movement flexibility in complex environments [6]. - **Collaborative Graph Memory Mechanism**: Inspired by retrieval-augmented generation (RAG), this mechanism enhances memory management for better navigation performance [8]. - **No-Training Deployment Mode**: DyNaVLM can be deployed without task-specific fine-tuning, reducing deployment costs and improving generalization across different environments and tasks [8]. Group 3: System Architecture and Methodology - **Problem Formalization**: The system takes inputs such as target descriptions and RGB-D observations to determine appropriate actions, maintaining a memory function to extract spatial features [11]. - **Memory Manager**: This component connects VLM and graph-structured memory, capturing spatial relationships and semantic object information [12]. - **Action Proposer and Selector**: The action proposer simplifies continuous search space into discrete candidates, while the selector generates final navigation actions based on geometric candidates and contextual memory [14][15]. Group 4: Experimental Evaluation - **Simulation Environment Evaluation**: DyNaVLM achieved a success rate (SR) of 45.0% and a path length weighted success rate (SPL) of 0.232 in ObjectNav benchmarks, outperforming previous VLM frameworks [19][22]. - **Real-World Evaluation**: DyNaVLM demonstrated superior performance in real-world settings, particularly in tasks requiring the identification of multiple targets, showcasing its robustness and efficiency in dynamic environments [27].