UrbanVLA
Search documents
如何构建通用具身导航大模型?
具身智能之心· 2025-11-20 00:03
点击下方 卡片 ,关注" 具身智能 之心 "公众号 >>直播和内容获取转到 → 具身智能之心知识星球 点击按钮预约直播 今天晚上我们邀请到了北京大学博士生张嘉曌作客具身智能之心,为大家直播分享他们团队在通用导航大模型领域的一系列前沿探索。 当前具身智能的导航研究多受限于特定任务与机器人平台,为突破这一局限, 他们团队的工作从跨任务的导航大模型Uni-NaVid,推进到跨本体的导航大模型 NavFoM,并成功应用于视觉避障、城区微出行与智能跟随等实际场景。 精彩看点 1.跨任务导航大模型: Uni-NaVid 2.跨任务跨本体导航大模型:NavFoM 3.导航大模型应用 : TrackVLA++, UrbanVLA, MM-Nav 面对非结构化、高动态环境以及需要语言理解的复杂任务,传统导航系统已难以满足需求。导航大模型的出现,将导航算法的范畴从专用能力拓展至通用智能移动 能力,为实现具身智能的落地开启了新的路径。欢迎前来聆听,共同探讨通用导航的未来发展。 参考材料 : Uni-Navid: https://pku-epic.github.io/Uni-NaVid/ NavFoM: https://pku-ep ...
多任务、全场景、跨本体通用移动:银河通用发布环视导航基座大模型
具身智能之心· 2025-11-06 00:03
Core Viewpoint - The article discusses the advancements in navigation models for robots, particularly focusing on the launch of the NavFoM (Navigation Foundation Model) by Galaxy General Robotics, which represents a significant leap in the capabilities of robotic navigation systems, allowing for more autonomous and adaptable robots in various environments [3][9][27]. Group 1: Technological Advancements - The NavFoM is the world's first cross-entity panoramic navigation foundation model, unifying various navigation tasks such as Vision-and-Language Navigation, Object-goal Navigation, Visual Tracking, and Autonomous Driving into a single framework [3][9]. - NavFoM allows robots to autonomously perceive their environment and make navigation decisions in unknown settings, moving beyond simple following tasks [9][10]. - The model employs a unified learning paradigm that enables knowledge sharing across different tasks and robot forms, enhancing the efficiency of training and application [13][14]. Group 2: Key Features - NavFoM supports both indoor and outdoor scenarios, operates in zero-shot conditions without the need for mapping or additional training data, and can adapt to various robot types, including quadrupeds, wheeled humanoids, drones, and cars [11][12]. - The model incorporates two key innovations: TVI Tokens for understanding time and direction, and BATS strategy for efficient sampling of video data, allowing for real-time responses while conserving computational resources [17][19]. - The training dataset for NavFoM includes over 8 million cross-task navigation data points and 4 million open-ended question-answer pairs, significantly enhancing its learning capabilities [21][23]. Group 3: Application and Impact - NavFoM has demonstrated state-of-the-art performance in various international benchmarks, showcasing its ability to generalize across tasks and environments without the need for task-specific fine-tuning [25]. - The model has successfully driven various robot forms to execute complex tasks, marking a significant step towards the realization of embodied intelligence in navigation systems [25][27]. - The introduction of NavFoM is seen as a foundational element for a comprehensive navigation system that can support a wide range of applications, from indoor navigation to urban environments, effectively transforming robotic capabilities [29][30].