银河通用全新模型统一机器人导航任务,7B参数模型支持实时部署
量子位·2025-11-09 07:01

Core Viewpoint - The article discusses the development of NavFoM, a foundational model for embodied navigation that aims to unify navigation tasks across different robots and scenarios, moving from specialized to general-purpose navigation capabilities [1][20]. Group 1: Unified Navigation Paradigm - NavFoM is based on a fundamental idea of unifying navigation tasks for different robots into a common paradigm: streaming video input from robots combined with natural language navigation instructions to predict action trajectories [3][21]. - The model supports multiple tasks such as visual language navigation, target search, target following, and autonomous driving, across various environments including indoor and outdoor settings, and is applicable to different types of robots like quadrupeds, wheeled robots, humanoids, drones, and cars [3][21]. Group 2: Model Structure and Features - The model structure includes TVI Tokens, which provide a scalable method for the model to understand images under different tasks and camera settings [5]. - NavFoM employs a Budget-Aware Token Sampling Strategy (BATS) to adaptively sample key frames during navigation, ensuring efficient real-time deployment of the 7B parameter model while maintaining performance [6][11]. Group 3: Training Data and Performance - The team collected 8 million navigation data entries, including visual language navigation, target navigation, target tracking, and autonomous driving data, covering various robot types and scenarios [12][21]. - NavFoM achieved state-of-the-art (SOTA) and SOTA-comparable results across multiple public benchmarks without requiring task-specific fine-tuning, demonstrating its versatility and effectiveness [16][21]. Group 4: Future Implications - The development of NavFoM marks a significant step towards generalizing embodied intelligent navigation models, enabling scalable navigation technology across industries [20][21]. - The team aims to attract more attention to embodied navigation research and stimulate the emergence of new technologies, datasets, and benchmarks, facilitating innovation in intelligent services [21].