哈工大提出UAV-ON：面向空中智能体的开放世界目标导航基准测试

Research Background and Motivation - The application of drones in various fields such as cargo transport, emergency rescue, and environmental monitoring is increasing, necessitating autonomous navigation in complex, dynamic environments [2] - Existing research primarily relies on visual-language navigation (VLN) methods, which require detailed step-by-step language instructions, limiting scalability and autonomy in open-world scenarios [2] - Object navigation (ObjectNav) is proposed as an alternative, focusing on semantic cues for target localization without dense instruction sequences, yet it has been underexplored in large-scale, unstructured outdoor environments [2] UAV-ON Benchmark Overview - UAV-ON is the first large-scale benchmark for instance-level target navigation of drones in open-world settings, featuring diverse environments built on Unreal Engine, covering urban, forest, mountainous, and aquatic scenes, with a total area of approximately 9 million square units [4] - The benchmark defines 1,270 annotated targets, each associated with an instance-level semantic instruction, introducing real-world ambiguities and reasoning challenges for drones [4] Task Setup - Each task involves the drone being randomly placed in the environment, relying solely on RGB-D sensor data for navigation, requiring autonomous obstacle avoidance and path planning without global maps or external information [6] - The task terminates when the drone issues a stop command, collides with an obstacle, or reaches a maximum of 150 steps, with success defined as being within 20 units of the target [6] Sensor and Action Space - The drone is equipped with four synchronized RGB-D cameras capturing images from different orientations, relying entirely on first-person perception and memory for navigation [9] - The action space includes parameterized continuous actions for translation and rotation, requiring physical execution of movements, which enhances realism compared to existing benchmarks [9] Dataset and Evaluation Metrics - The training set consists of 10 environments and 10,000 navigation episodes, while the test set includes 1,000 episodes across familiar and new environments to assess generalization capabilities [10] - Evaluation metrics include success rate (SR), potential success rate (OSR), success distance (DTS), and success-weighted path length (SPL) [10] Baseline Methods and Experimental Results - Four baseline methods were implemented to compare performance across different strategies, including random strategy, CLIP heuristic exploration, and aerial object navigation agents (AOA) [11][13] - Results indicate significant performance differences among methods, with AOA-V showing the highest OSR (26.30%) but lower SR (4.20%) and SPL (0.87%), highlighting challenges in simultaneous semantic understanding and motion planning [14][16] - Collision rates exceeded 30% across all methods, indicating deficiencies in obstacle avoidance and robust control [15] Conclusion - The UAV-ON benchmark provides a comprehensive framework for advancing drone navigation research in open-world environments, addressing the limitations of existing methods and paving the way for future developments in autonomous navigation technologies [2][4][10]