Core Insights - The article discusses the development of NavFoM, a foundational model for embodied navigation that aims to unify navigation tasks across different robots and scenarios, marking a significant technological leap from specialized to general-purpose navigation [1][29]. Group 1: Unified Navigation Paradigm - NavFoM is based on a fundamental idea of unifying different robot navigation tasks into a common paradigm: streaming video input from robots combined with natural language navigation instructions to predict action trajectories [3]. - The model supports multiple tasks such as visual language navigation, target search, target following, and autonomous driving, across various environments including indoor and outdoor settings, and is applicable to different types of robots like quadrupeds, wheeled robots, humanoids, drones, and cars [3][29]. Group 2: Model Structure and Efficiency - The model features TVI Tokens, which provide a scalable method for understanding images under different tasks and camera settings, enhancing the model's adaptability [5]. - To enable real-time deployment of the 7B parameter navigation model, the team introduced the Budget-Aware Token Sampling Strategy (BATS), which adaptively samples key frames under computational constraints to maintain performance while ensuring efficient operation on real robots [6][11]. Group 3: Training Data and Performance - The team trained NavFoM on 8 million navigation data entries, including various tasks and robot types, as well as 4 million entries of open-world question-answering data, effectively doubling the training volume compared to previous works [12][15]. - NavFoM achieved state-of-the-art (SOTA) and SOTA-comparable results across multiple public benchmarks without requiring task-specific fine-tuning, demonstrating its versatility and effectiveness [16][29]. Group 4: Future Implications - The development of NavFoM signifies a move towards generalization in embodied navigation models, enabling cross-industry applications and fostering further research in intelligent navigation technologies [29]. - The team aims to inspire new technologies, datasets, and benchmarks in the field of embodied navigation, accelerating innovation in intelligent services and production capabilities [29].
银河通用全新模型统一机器人导航任务,7B参数模型支持实时部署
具身智能之心·2025-11-10 00:02