Workflow
UrbanVLA
icon
Search documents
如何构建通用具身导航大模型?
具身智能之心· 2025-11-20 00:03
Core Insights - The article discusses advancements in general navigation models within the field of embodied intelligence, highlighting the transition from task-specific navigation systems to more universal models that can handle a variety of tasks and environments [2][5]. Group 1: Navigation Models - The Uni-NaVid model represents a cross-task navigation framework that aims to enhance the capabilities of navigation systems beyond specific tasks [5][6]. - The NavFoM model is a cross-ontology navigation framework that further expands the application of navigation algorithms to various real-world scenarios, including visual obstacle avoidance and urban micro-mobility [2][5]. Group 2: Applications and Challenges - Current navigation systems struggle with unstructured, dynamic environments and complex tasks requiring language understanding, which traditional systems cannot adequately address [2][5]. - The introduction of navigation large models is seen as a pathway to achieving embodied intelligence by broadening the scope of navigation algorithms from specialized capabilities to general intelligent mobility [2][5]. Group 3: Event Details - A live session featuring Zhang Jiazhao, a PhD student from Peking University, will take place on November 20 from 19:30 to 20:30, focusing on the exploration of general navigation models [5][6]. - The session will cover specific applications of the navigation models, including TrackVLA++, UrbanVLA, and MM-Nav, showcasing their practical implementations [6].
多任务、全场景、跨本体通用移动:银河通用发布环视导航基座大模型
具身智能之心· 2025-11-06 00:03
Core Viewpoint - The article discusses the advancements in navigation models for robots, particularly focusing on the launch of the NavFoM (Navigation Foundation Model) by Galaxy General Robotics, which represents a significant leap in the capabilities of robotic navigation systems, allowing for more autonomous and adaptable robots in various environments [3][9][27]. Group 1: Technological Advancements - The NavFoM is the world's first cross-entity panoramic navigation foundation model, unifying various navigation tasks such as Vision-and-Language Navigation, Object-goal Navigation, Visual Tracking, and Autonomous Driving into a single framework [3][9]. - NavFoM allows robots to autonomously perceive their environment and make navigation decisions in unknown settings, moving beyond simple following tasks [9][10]. - The model employs a unified learning paradigm that enables knowledge sharing across different tasks and robot forms, enhancing the efficiency of training and application [13][14]. Group 2: Key Features - NavFoM supports both indoor and outdoor scenarios, operates in zero-shot conditions without the need for mapping or additional training data, and can adapt to various robot types, including quadrupeds, wheeled humanoids, drones, and cars [11][12]. - The model incorporates two key innovations: TVI Tokens for understanding time and direction, and BATS strategy for efficient sampling of video data, allowing for real-time responses while conserving computational resources [17][19]. - The training dataset for NavFoM includes over 8 million cross-task navigation data points and 4 million open-ended question-answer pairs, significantly enhancing its learning capabilities [21][23]. Group 3: Application and Impact - NavFoM has demonstrated state-of-the-art performance in various international benchmarks, showcasing its ability to generalize across tasks and environments without the need for task-specific fine-tuning [25]. - The model has successfully driven various robot forms to execute complex tasks, marking a significant step towards the realization of embodied intelligence in navigation systems [25][27]. - The introduction of NavFoM is seen as a foundational element for a comprehensive navigation system that can support a wide range of applications, from indoor navigation to urban environments, effectively transforming robotic capabilities [29][30].