港科大&北京人形提出LOVON：足式机器人开放世界全域目标追踪新范式！

Core Viewpoint - The LOVON framework represents a significant advancement in the field of robotics, enabling legged robots to autonomously navigate complex, dynamic environments by integrating large language models, open vocabulary visual detection, and precise language-motion mapping [2][5][20]. Group 1: Introduction to LOVON - The LOVON framework addresses the challenges of long-range multi-target navigation in open environments, overcoming limitations of traditional methods that struggle with real-time visual disturbances and target loss [1][5]. - It combines task planning capabilities of large language models with open vocabulary visual detection and a language-motion mapping model, allowing for efficient navigation in dynamic, unstructured settings [2][5]. Group 2: Core Modules of LOVON - LOVON integrates three core modules to create a closed loop of language, vision, and motion, enhancing the robot's navigation capabilities [9]. - The framework employs Laplacian variance filtering technology to stabilize visual processing, improving the detection rate of clear frames by 25% during robot movement [11][12]. - An adaptive execution logic allows robots to respond to unexpected situations, such as target loss or external disturbances, by switching to search mode or seamlessly executing new commands [13][15]. Group 3: Performance Metrics - In simulation environments like GymUnreal, LOVON achieved a success rate of 1.00, significantly outperforming traditional methods, which had a success rate of 0.94 [18]. - The training efficiency of LOVON is remarkable, requiring only 1.5 hours compared to 360 hours for the best competing model, indicating a 240-fold improvement [18]. Group 4: Real-World Applications - LOVON has been successfully deployed on various legged robot platforms, including Unitree Go2, B2, and H1-2, showcasing its plug-and-play capability without the need for extensive customization [19]. - The framework is poised to transform applications in smart homes, industrial inspections, and field research, providing robust support for diverse tasks [20][21]. Group 5: Key Features - LOVON demonstrates exceptional open-world adaptability, enabling robots to recognize a wide range of objects in unfamiliar environments [23]. - It excels in multi-target long-range tracking, executing complex tasks smoothly and without interruption [23]. - The framework exhibits strong robustness in dynamic environments, maintaining stable tracking of moving targets across various terrains [23]. - LOVON's anti-interference capabilities allow it to quickly reacquire targets and continue tasks despite disruptions [23].