目标导航

Search documents
哈工大提出UAV-ON:面向空中智能体的开放世界目标导航基准测试
具身智能之心· 2025-08-05 00:03
Research Background and Motivation - The application of drones in various fields such as cargo transport, emergency rescue, and environmental monitoring is increasing, necessitating autonomous navigation in complex, dynamic environments [2] - Existing research primarily relies on visual-language navigation (VLN) methods, which require detailed step-by-step language instructions, limiting scalability and autonomy in open-world scenarios [2] - Object navigation (ObjectNav) is proposed as an alternative, focusing on semantic cues for target localization without dense instruction sequences, yet it has been underexplored in large-scale, unstructured outdoor environments [2] UAV-ON Benchmark Overview - UAV-ON is the first large-scale benchmark for instance-level target navigation of drones in open-world settings, featuring diverse environments built on Unreal Engine, covering urban, forest, mountainous, and aquatic scenes, with a total area of approximately 9 million square units [4] - The benchmark defines 1,270 annotated targets, each associated with an instance-level semantic instruction, introducing real-world ambiguities and reasoning challenges for drones [4] Task Setup - Each task involves the drone being randomly placed in the environment, relying solely on RGB-D sensor data for navigation, requiring autonomous obstacle avoidance and path planning without global maps or external information [6] - The task terminates when the drone issues a stop command, collides with an obstacle, or reaches a maximum of 150 steps, with success defined as being within 20 units of the target [6] Sensor and Action Space - The drone is equipped with four synchronized RGB-D cameras capturing images from different orientations, relying entirely on first-person perception and memory for navigation [9] - The action space includes parameterized continuous actions for translation and rotation, requiring physical execution of movements, which enhances realism compared to existing benchmarks [9] Dataset and Evaluation Metrics - The training set consists of 10 environments and 10,000 navigation episodes, while the test set includes 1,000 episodes across familiar and new environments to assess generalization capabilities [10] - Evaluation metrics include success rate (SR), potential success rate (OSR), success distance (DTS), and success-weighted path length (SPL) [10] Baseline Methods and Experimental Results - Four baseline methods were implemented to compare performance across different strategies, including random strategy, CLIP heuristic exploration, and aerial object navigation agents (AOA) [11][13] - Results indicate significant performance differences among methods, with AOA-V showing the highest OSR (26.30%) but lower SR (4.20%) and SPL (0.87%), highlighting challenges in simultaneous semantic understanding and motion planning [14][16] - Collision rates exceeded 30% across all methods, indicating deficiencies in obstacle avoidance and robust control [15] Conclusion - The UAV-ON benchmark provides a comprehensive framework for advancing drone navigation research in open-world environments, addressing the limitations of existing methods and paving the way for future developments in autonomous navigation technologies [2][4][10]
大话一下!具身里面视觉语言导航和目标导航有什么区别?
具身智能之心· 2025-08-01 10:30
Core Viewpoint - The article discusses the evolution of robot navigation technology from traditional mapping and localization to large model-based navigation, which includes visual language navigation (VLN) and goal navigation. VLN focuses on following instructions, while goal navigation emphasizes autonomous exploration and pathfinding based on environmental understanding [1][5]. Group 1: Visual Language Navigation (VLN) - VLN is fundamentally a task of following instructions, which involves understanding language commands, perceiving the environment, and planning movement strategies. The VLN robot system consists of a visual language encoder, historical environmental representation, and action strategy modules [2][4]. - The learning process for the strategy network has shifted from extracting patterns from labeled datasets to leveraging large language models (LLMs) for effective planning information extraction [4] - The architecture of VLN robots requires them to accumulate visual observations and execute actions in a loop, making it crucial to determine the current task stage for informed decision-making [4]. Group 2: Goal Navigation - Goal navigation extends VLN by enabling agents to autonomously explore and plan paths in unfamiliar 3D environments based solely on target descriptions, such as coordinates or images [5][7]. - Unlike traditional VLN, goal-driven navigation systems must transition from understanding commands to independently interpreting the environment and making decisions, integrating computer vision, reinforcement learning, and 3D semantic understanding [7]. Group 3: Commercial Applications and Demand - Goal-driven navigation technology has been successfully implemented in various verticals, such as terminal delivery, where it combines with social navigation algorithms to handle dynamic environments and human interactions [9]. - Companies like Meituan and Starship Technologies have deployed delivery robots in complex urban settings, while others like Aethon have developed service robots for medical and hospitality sectors, enhancing service efficiency [9][10]. - The growth of humanoid robots has led to an increased focus on adapting navigation technology for applications in home services, healthcare, and industrial logistics, creating significant job demand in the navigation sector [10]. Group 4: Learning and Knowledge Challenges - Both VLN and goal navigation require knowledge across multiple domains, including natural language processing, computer vision, reinforcement learning, and graph neural networks, making it challenging for newcomers to gain comprehensive expertise [11]. - The fragmented nature of knowledge in these fields can lead to difficulties in learning, often causing individuals to abandon their studies before achieving a solid understanding [11].
具身目标导航是怎么找到目标并导航的?
具身智能之心· 2025-07-13 04:13
Core Viewpoint - The article discusses the evolution of robot navigation technology from traditional mapping and localization to large model-based navigation, which includes visual language navigation (VLN) and goal navigation. VLN focuses on following instructions, while goal navigation emphasizes understanding the environment to find paths independently [1][4]. Group 1: Visual Language Navigation (VLN) - VLN is fundamentally a task of following instructions, which involves understanding language commands, perceiving the environment, and planning movement strategies. The VLN robot system consists of a visual language encoder, environmental history representation, and action strategy modules [2]. - The key challenge in VLN is how to effectively compress information from visual and language inputs, with current trends favoring the use of large-scale pre-trained visual language models and LLMs for instruction breakdown and task segmentation [2][3]. - The learning of strategy networks has shifted from pattern extraction from labeled datasets to distilling effective planning information from LLMs, marking a significant research focus [3]. Group 2: Goal Navigation - Goal navigation extends VLN by requiring agents to autonomously explore and plan paths in unfamiliar 3D environments based solely on target descriptions, such as coordinates or images [4]. - Unlike traditional VLN, goal-driven navigation systems must transition from "understanding instructions to finding paths" by autonomously parsing semantics, modeling environments, and making dynamic decisions [6]. Group 3: Commercial Applications and Demand - Goal-driven navigation technology has been industrialized in various verticals, such as terminal delivery, where it combines with social navigation algorithms to handle dynamic environments. Examples include Meituan's delivery robots and Starship Technologies' campus delivery robots [8]. - In sectors like healthcare, hospitality, and food service, companies like 嘉楠科技, 云迹科技, and Aethon have deployed service robots for autonomous delivery, enhancing service efficiency [8]. - The development of humanoid robots has led to an increased focus on adapting navigation technology for home services, care, and industrial logistics, creating significant job demand in the navigation sector [9]. Group 4: Learning and Knowledge Challenges - Both VLN and goal navigation require knowledge across multiple domains, including natural language processing, computer vision, reinforcement learning, and graph neural networks, making the learning path challenging for newcomers [10].
机器人导航的2个模块:视觉语言导航和目标导航有什么区别?
具身智能之心· 2025-07-02 10:18
Core Viewpoint - The article discusses the evolution of robot navigation technology from traditional mapping and localization to large model-based navigation, which includes visual language navigation (VLN) and goal navigation. VLN focuses on following instructions, while goal navigation emphasizes understanding the environment to find paths independently [1][4]. Summary by Sections Visual Language Navigation (VLN) - VLN is fundamentally a task of following instructions, which involves understanding language commands, perceiving the environment, and planning movement strategies. The VLN robot system consists of three main modules: visual language encoder, environmental history representation, and action strategy [2]. - The robot processes language commands and visual observations, requiring effective information compression through a visual language encoder. Key issues include the choice of encoder and whether to project visual and language representations into a common space [2]. - The learning of the strategy network has shifted from extracting patterns from labeled datasets to distilling effective planning information from large language models (LLMs) [3]. Goal Navigation - Goal navigation extends VLN by enabling agents to explore unfamiliar 3D environments and plan paths based solely on target descriptions, such as coordinates or images [4]. - Unlike traditional VLN, goal-driven navigation requires a transition from "understanding instructions to finding paths" autonomously, involving semantic parsing, environmental modeling, and dynamic decision-making [6]. Commercial Application and Demand - Goal-driven navigation technology has been implemented in various verticals, such as terminal delivery, where it combines with social navigation algorithms to handle dynamic environments. Examples include Meituan's delivery robots and Starship Technologies' campus delivery robots [8]. - In sectors like healthcare, hospitality, and food service, companies like 嘉楠科技, 云迹科技, and Aethon have deployed service robots for autonomous delivery, enhancing service efficiency [8]. - The development of humanoid robots has led to an increased focus on adapting navigation technology, with companies like Unitree and Tesla showcasing advanced capabilities [9]. - The growth in this sector has created significant job demand, particularly in navigation roles, which are recognized as one of the first technology subfields to achieve practical application [9]. Knowledge and Learning Challenges - Both VLN and goal navigation encompass a wide range of knowledge areas, including natural language processing, computer vision, reinforcement learning, and graph neural networks. This complexity presents challenges for learners seeking to enhance their interdisciplinary skills [10].