哈工深提出UAV-ON：开放世界空中智能体目标导向导航基准测试

Core Viewpoint - The article presents UAV-ON, the first large-scale benchmark for open-world object goal navigation with aerial agents, defining over 11,000 navigation tasks across 14 high-fidelity outdoor scenes, emphasizing the need for drones to navigate complex environments autonomously [2][5]. Group 1: Research Background - UAV-ON aims to enhance drone navigation capabilities in diverse real-world environments, addressing the limitations of existing navigation studies that rely heavily on detailed language instructions [2]. - The benchmark includes a set of baseline strategies for drone navigation, such as random strategies, CLIP-based semantic heuristic algorithms, and the proposed aerial object navigation agent (AOA) [2]. Group 2: Environment and Task Definition - The UAV-ON benchmark defines an instance-level object navigation task where drones must navigate to target objects based on semantic instructions [5]. - Drones are equipped with multi-view RGB-D cameras and rely solely on their perception for navigation, without any global positioning signals [6][12]. Group 3: Action Space and Success Conditions - The action space for drones includes parameterized movements such as translation, rotation, and stopping, with each action linked to continuous control parameters [11][14]. - A successful episode is defined as the drone being within a specified distance from the target object at the end of the episode [7]. Group 4: Dataset Analysis and Environment Diversity - The UAV-ON dataset comprises 14 high-fidelity outdoor environments, featuring a variety of natural and man-made landscapes, with a total of 1,270 unique target objects distributed across approximately 9 million square units [15]. - The training set includes 10 diverse outdoor environments generating 10,000 navigation episodes, while the test set consists of 1,000 episodes to evaluate generalization capabilities [15]. Group 5: Experimental Results and Baseline Methods - Various baseline methods were tested, including Random, CLIP-H, AOA-F, and AOA-V, with AOA-V showing the best performance in Oracle success rate but lower in success rate and SPL [16][17]. - The results indicate that all methods have a collision rate exceeding 30%, highlighting a significant gap between current navigation strategies and the safety requirements for real-world drone operations [20]. Group 6: Conclusion and Future Work - UAV-ON serves as a comprehensive benchmark for semantic reasoning, obstacle perception, and target localization challenges in drone navigation [24]. - Future research will focus on enhancing multi-modal perception, prompt-based control, and developing safer, more reliable navigation strategies for autonomous drone operations in complex environments [24].