Workflow
视觉语言导航
icon
Search documents
具身目标导航/视觉语言导航/点导航工作汇总!
具身智能之心· 2025-08-12 07:04
点击下方 卡片 ,关注" 具身智能 之心 "公众号 编辑丨具身智能之心 本文只做学术分享,如有侵权,联系删文 >> 点击进入→ 具身智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区 : 具身智能之心知识星球 (戳我) , 这里包含所有你想要 的。 最近有同学向我们咨询了一些具身导航相关的工作,今天也为大家梳理一下这几年发展的路线和方法论, 建议收藏。更多内容欢迎加入国内首个具身智能全栈学习社区:具身智能之心知识星球! 点目标导航工作汇总 Comparison of Model-Free and Model-Based Learning-Informed Planning for PointGoal Navigation RobustNav: Towards Benchmarking Robustness in Embodied Navigation 会议/年份:CoRL, 2022 论文链接:https://openreview.net/pdf?id=2s92OhjT4L 代码:https://github.com/yimengli46/bellman_point_goal 项目链接:ht ...
大话一下!具身里面视觉语言导航和目标导航有什么区别?
具身智能之心· 2025-08-01 10:30
Core Viewpoint - The article discusses the evolution of robot navigation technology from traditional mapping and localization to large model-based navigation, which includes visual language navigation (VLN) and goal navigation. VLN focuses on following instructions, while goal navigation emphasizes autonomous exploration and pathfinding based on environmental understanding [1][5]. Group 1: Visual Language Navigation (VLN) - VLN is fundamentally a task of following instructions, which involves understanding language commands, perceiving the environment, and planning movement strategies. The VLN robot system consists of a visual language encoder, historical environmental representation, and action strategy modules [2][4]. - The learning process for the strategy network has shifted from extracting patterns from labeled datasets to leveraging large language models (LLMs) for effective planning information extraction [4] - The architecture of VLN robots requires them to accumulate visual observations and execute actions in a loop, making it crucial to determine the current task stage for informed decision-making [4]. Group 2: Goal Navigation - Goal navigation extends VLN by enabling agents to autonomously explore and plan paths in unfamiliar 3D environments based solely on target descriptions, such as coordinates or images [5][7]. - Unlike traditional VLN, goal-driven navigation systems must transition from understanding commands to independently interpreting the environment and making decisions, integrating computer vision, reinforcement learning, and 3D semantic understanding [7]. Group 3: Commercial Applications and Demand - Goal-driven navigation technology has been successfully implemented in various verticals, such as terminal delivery, where it combines with social navigation algorithms to handle dynamic environments and human interactions [9]. - Companies like Meituan and Starship Technologies have deployed delivery robots in complex urban settings, while others like Aethon have developed service robots for medical and hospitality sectors, enhancing service efficiency [9][10]. - The growth of humanoid robots has led to an increased focus on adapting navigation technology for applications in home services, healthcare, and industrial logistics, creating significant job demand in the navigation sector [10]. Group 4: Learning and Knowledge Challenges - Both VLN and goal navigation require knowledge across multiple domains, including natural language processing, computer vision, reinforcement learning, and graph neural networks, making it challenging for newcomers to gain comprehensive expertise [11]. - The fragmented nature of knowledge in these fields can lead to difficulties in learning, often causing individuals to abandon their studies before achieving a solid understanding [11].
具身目标导航是怎么找到目标并导航的?
具身智能之心· 2025-07-13 04:13
Core Viewpoint - The article discusses the evolution of robot navigation technology from traditional mapping and localization to large model-based navigation, which includes visual language navigation (VLN) and goal navigation. VLN focuses on following instructions, while goal navigation emphasizes understanding the environment to find paths independently [1][4]. Group 1: Visual Language Navigation (VLN) - VLN is fundamentally a task of following instructions, which involves understanding language commands, perceiving the environment, and planning movement strategies. The VLN robot system consists of a visual language encoder, environmental history representation, and action strategy modules [2]. - The key challenge in VLN is how to effectively compress information from visual and language inputs, with current trends favoring the use of large-scale pre-trained visual language models and LLMs for instruction breakdown and task segmentation [2][3]. - The learning of strategy networks has shifted from pattern extraction from labeled datasets to distilling effective planning information from LLMs, marking a significant research focus [3]. Group 2: Goal Navigation - Goal navigation extends VLN by requiring agents to autonomously explore and plan paths in unfamiliar 3D environments based solely on target descriptions, such as coordinates or images [4]. - Unlike traditional VLN, goal-driven navigation systems must transition from "understanding instructions to finding paths" by autonomously parsing semantics, modeling environments, and making dynamic decisions [6]. Group 3: Commercial Applications and Demand - Goal-driven navigation technology has been industrialized in various verticals, such as terminal delivery, where it combines with social navigation algorithms to handle dynamic environments. Examples include Meituan's delivery robots and Starship Technologies' campus delivery robots [8]. - In sectors like healthcare, hospitality, and food service, companies like 嘉楠科技, 云迹科技, and Aethon have deployed service robots for autonomous delivery, enhancing service efficiency [8]. - The development of humanoid robots has led to an increased focus on adapting navigation technology for home services, care, and industrial logistics, creating significant job demand in the navigation sector [9]. Group 4: Learning and Knowledge Challenges - Both VLN and goal navigation require knowledge across multiple domains, including natural language processing, computer vision, reinforcement learning, and graph neural networks, making the learning path challenging for newcomers [10].
传统导航和具身目标导航到底有啥区别?
具身智能之心· 2025-07-04 09:48
Core Viewpoint - The article discusses the evolution of robot navigation technology from traditional mapping and localization to large model-based navigation, which includes visual language navigation (VLN) and goal navigation. VLN focuses on following instructions, while goal navigation emphasizes understanding the environment to find paths independently [1][4]. Group 1: Visual Language Navigation (VLN) - VLN is fundamentally a task of following instructions, which involves understanding language commands, perceiving the environment, and planning movement strategies. The VLN robot system consists of a visual language encoder, environmental history representation, and action strategy modules [2]. - The key challenge in VLN is how to effectively compress information from visual and language inputs, with current trends favoring the use of large-scale pre-trained visual language models and LLMs for instruction breakdown and task segmentation [2][3]. - The learning of the strategy network has shifted from extracting patterns from labeled datasets to distilling effective planning information from LLMs, which has become a recent research focus [3]. Group 2: Goal Navigation - Goal navigation extends VLN by requiring agents to autonomously explore and plan paths in unfamiliar 3D environments based solely on target descriptions, such as coordinates or images [4]. - Unlike traditional VLN that relies on explicit instructions, goal-driven navigation systems must transition from "understanding commands to finding paths" by autonomously parsing semantics, modeling environments, and making dynamic decisions [6]. Group 3: Commercial Applications and Demand - Goal-driven navigation technology has been industrialized in various verticals, such as terminal delivery, where it combines with social navigation algorithms to handle dynamic environments and human interactions. Examples include Meituan's delivery robots and Starship Technologies' campus delivery robots [8]. - In sectors like healthcare, hospitality, and food service, companies like 嘉楠科技, 云迹科技, and Aethon have deployed service robots for autonomous delivery, enhancing service response efficiency [8]. - The development of humanoid robots has led to an increased focus on the adaptability of navigation technology, with companies like Unitree and Tesla showcasing advanced navigation capabilities [9]. Group 4: Knowledge and Learning Challenges - Both VLN and goal navigation require knowledge across multiple domains, including natural language processing, computer vision, reinforcement learning, and graph neural networks, making it a challenging learning path for newcomers [10].
港大强化学习驱动连续环境具身导航方法:VLN-R1
具身智能之心· 2025-07-04 09:48
Core Viewpoint - The article presents the VLN-R1 framework, which utilizes large vision-language models (LVLM) for continuous navigation in real-world environments, addressing limitations of previous discrete navigation methods [5][15]. Research Background - The VLN-R1 framework processes first-person video streams to generate continuous navigation actions, enhancing the realism of navigation tasks [5]. - The VLN-Ego dataset is constructed using the Habitat simulator, providing rich visual and language information for training LVLMs [5][6]. - The importance of visual-language navigation (VLN) is emphasized as a core challenge in embodied AI, requiring real-time decision-making based on natural language instructions [5]. Methodology - The VLN-Ego dataset includes natural language navigation instructions, historical frames, and future action sequences, designed to balance local details and overall context [6]. - The training method consists of two phases: supervised fine-tuning (SFT) to align action predictions with expert demonstrations, followed by reinforcement fine-tuning (RFT) to optimize model performance [7][9]. Experimental Results - In the R2R task, VLN-R1 achieved a success rate (SR) of 30.2% with the 7B model, significantly outperforming traditional models without depth maps or navigation maps [11]. - The model demonstrated strong cross-domain adaptability, outperforming fully supervised models in the RxR task with only 10K samples used for RFT [12]. - The design of predicting future actions was found to be crucial for performance, with the best results obtained by predicting six future actions [14]. Conclusion and Future Work - VLN-R1 integrates LVLM and reinforcement learning fine-tuning, achieving state-of-the-art performance in simulated environments and showing potential for small models to match larger ones [15]. - Future research will focus on validating the model's generalization capabilities in real-world settings and exploring applications in other embodied AI tasks [15].
机器人导航的2个模块:视觉语言导航和目标导航有什么区别?
具身智能之心· 2025-07-02 10:18
Core Viewpoint - The article discusses the evolution of robot navigation technology from traditional mapping and localization to large model-based navigation, which includes visual language navigation (VLN) and goal navigation. VLN focuses on following instructions, while goal navigation emphasizes understanding the environment to find paths independently [1][4]. Summary by Sections Visual Language Navigation (VLN) - VLN is fundamentally a task of following instructions, which involves understanding language commands, perceiving the environment, and planning movement strategies. The VLN robot system consists of three main modules: visual language encoder, environmental history representation, and action strategy [2]. - The robot processes language commands and visual observations, requiring effective information compression through a visual language encoder. Key issues include the choice of encoder and whether to project visual and language representations into a common space [2]. - The learning of the strategy network has shifted from extracting patterns from labeled datasets to distilling effective planning information from large language models (LLMs) [3]. Goal Navigation - Goal navigation extends VLN by enabling agents to explore unfamiliar 3D environments and plan paths based solely on target descriptions, such as coordinates or images [4]. - Unlike traditional VLN, goal-driven navigation requires a transition from "understanding instructions to finding paths" autonomously, involving semantic parsing, environmental modeling, and dynamic decision-making [6]. Commercial Application and Demand - Goal-driven navigation technology has been implemented in various verticals, such as terminal delivery, where it combines with social navigation algorithms to handle dynamic environments. Examples include Meituan's delivery robots and Starship Technologies' campus delivery robots [8]. - In sectors like healthcare, hospitality, and food service, companies like 嘉楠科技, 云迹科技, and Aethon have deployed service robots for autonomous delivery, enhancing service efficiency [8]. - The development of humanoid robots has led to an increased focus on adapting navigation technology, with companies like Unitree and Tesla showcasing advanced capabilities [9]. - The growth in this sector has created significant job demand, particularly in navigation roles, which are recognized as one of the first technology subfields to achieve practical application [9]. Knowledge and Learning Challenges - Both VLN and goal navigation encompass a wide range of knowledge areas, including natural language processing, computer vision, reinforcement learning, and graph neural networks. This complexity presents challenges for learners seeking to enhance their interdisciplinary skills [10].
第一篇具身领域论文应该怎么展开?
具身智能之心· 2025-06-27 09:41
Core Viewpoint - The article promotes a comprehensive tutoring service for students facing challenges in research paper writing, particularly in cutting-edge fields such as multimodal large models, embodied intelligence, and robotics [2][3][4]. Group 1: Tutoring Services Offered - The service includes one-on-one customized guidance in various advanced research areas, including multimodal large models, visual-language navigation, and robot navigation [3][4]. - The tutoring team consists of PhD researchers from prestigious institutions like CMU, Stanford, and MIT, with experience in top-tier conference reviews [4]. - The tutoring process covers the entire research paper lifecycle, from topic selection to experimental design, coding, writing, and submission strategies [4]. Group 2: Target Audience and Benefits - The service targets students struggling with research topics, data modeling, and feedback from advisors, offering a solution to enhance their academic performance [2][5]. - The first 50 students to consult can receive a free matching with a dedicated tutor for in-depth analysis and tailored advice on conference and journal submissions [5]. - The focus is not only on publishing papers but also on the practical application and value of research outcomes in industrial and academic contexts [4].