UrbanVLA - filings, earnings calls, financial reports, news

UrbanVLA

Search documents

具身智能之心· 2025-11-20 00:03

点击下方卡片，关注" 具身智能之心 "公众号 >>直播和内容获取转到 → 具身智能之心知识星球点击按钮预约直播今天晚上我们邀请到了北京大学博士生张嘉曌作客具身智能之心，为大家直播分享他们团队在通用导航大模型领域的一系列前沿探索。当前具身智能的导航研究多受限于特定任务与机器人平台，为突破这一局限，他们团队的工作从跨任务的导航大模型Uni-NaVid，推进到跨本体的导航大模型 NavFoM，并成功应用于视觉避障、城区微出行与智能跟随等实际场景。精彩看点 1.跨任务导航大模型: Uni-NaVid 2.跨任务跨本体导航大模型：NavFoM 3.导航大模型应用 : TrackVLA++, UrbanVLA, MM-Nav 面对非结构化、高动态环境以及需要语言理解的复杂任务，传统导航系统已难以满足需求。导航大模型的出现，将导航算法的范畴从专用能力拓展至通用智能移动能力，为实现具身智能的落地开启了新的路径。欢迎前来聆听，共同探讨通用导航的未来发展。参考材料： Uni-Navid: https://pku-epic.github.io/Uni-NaVid/ NavFoM: https://pku-ep ...

多任务、全场景、跨本体通用移动：银河通用发布环视导航基座大模型

具身智能之心· 2025-11-06 00:03

Core Viewpoint - The article discusses the advancements in navigation models for robots, particularly focusing on the launch of the NavFoM (Navigation Foundation Model) by Galaxy General Robotics, which represents a significant leap in the capabilities of robotic navigation systems, allowing for more autonomous and adaptable robots in various environments [3][9][27]. Group 1: Technological Advancements - The NavFoM is the world's first cross-entity panoramic navigation foundation model, unifying various navigation tasks such as Vision-and-Language Navigation, Object-goal Navigation, Visual Tracking, and Autonomous Driving into a single framework [3][9]. - NavFoM allows robots to autonomously perceive their environment and make navigation decisions in unknown settings, moving beyond simple following tasks [9][10]. - The model employs a unified learning paradigm that enables knowledge sharing across different tasks and robot forms, enhancing the efficiency of training and application [13][14]. Group 2: Key Features - NavFoM supports both indoor and outdoor scenarios, operates in zero-shot conditions without the need for mapping or additional training data, and can adapt to various robot types, including quadrupeds, wheeled humanoids, drones, and cars [11][12]. - The model incorporates two key innovations: TVI Tokens for understanding time and direction, and BATS strategy for efficient sampling of video data, allowing for real-time responses while conserving computational resources [17][19]. - The training dataset for NavFoM includes over 8 million cross-task navigation data points and 4 million open-ended question-answer pairs, significantly enhancing its learning capabilities [21][23]. Group 3: Application and Impact - NavFoM has demonstrated state-of-the-art performance in various international benchmarks, showcasing its ability to generalize across tasks and environments without the need for task-specific fine-tuning [25]. - The model has successfully driven various robot forms to execute complex tasks, marking a significant step towards the realization of embodied intelligence in navigation systems [25][27]. - The introduction of NavFoM is seen as a foundational element for a comprehensive navigation system that can support a wide range of applications, from indoor navigation to urban environments, effectively transforming robotic capabilities [29][30].