Workflow
ExploreVLM框架
icon
Search documents
ExploreVLM:基于视觉-语言模型的闭环机器人探索任务规划框架
具身智能之心· 2025-08-20 00:03
Research Background and Core Issues - The development of embodied intelligence has led to the integration of robots into daily life as human assistants, necessitating their ability to interpret high-level instructions, perceive dynamic environments, and adjust plans in real-time [3] - Vision-Language Models (VLMs) have emerged as a significant direction for robot task planning, but existing methods exhibit limitations in three areas: insufficient interactive exploration capabilities, limited perception accuracy, and poor planning adaptability [6] Proposed Framework - The ExploreVLM framework is introduced, which integrates perception, planning, and execution verification through a closed-loop design to address the identified limitations [5] Core Framework Design - ExploreVLM operates on a "perception-planning-execution-verification" closed-loop model, which includes: 1. Insufficient interactive exploration capabilities for scenarios requiring active information retrieval [6] 2. Limited perception accuracy in capturing object spatial relationships and dynamic changes [6] 3. Poor planning adaptability, primarily relying on open-loop static planning, which can fail in complex environments [6] Key Module Analysis 1. **Goal-Centric Spatial Relation Graph (Scene Perception)** - Constructs a structured graph representation to support complex reasoning, extracting object categories, attributes, and spatial relationships from initial RGB images and task goals [8] - A two-stage planner generates sub-goals and action sequences for exploration and completion phases, optimizing through self-reflection [8] - The execution validator compares pre- and post-execution states to generate feedback and dynamically adjust plans until task completion [8] 2. **Dual-Stage Self-Reflective Planner** - Designed to separate the needs for "unknown information exploration" and "goal achievement," employing a self-reflection mechanism to correct plans and address logical errors [10] - The exploration phase generates sub-goals for information retrieval, while the completion phase generates action sequences based on exploration results [10] 3. **Execution Validator** - Implements a step-by-step validation mechanism to ensure real-time feedback integration into the closed loop, supporting dynamic adjustments [14] Experimental Validation 1. **Experimental Setup** - Conducted on a real robot platform with five tasks of increasing complexity, comparing against baseline methods ReplanVLM and VILA, with a 50% action failure rate introduced to test robustness [15] 2. **Core Results** - ExploreVLM achieved an average success rate of 94%, significantly outperforming ReplanVLM (22%) and VILA (30%) [16] - The framework demonstrated effective action validation and logical consistency checks, ensuring task goals were met [17] 3. **Ablation Studies** - Performance dropped significantly when core modules were removed, highlighting the importance of the collaborative function of the three modules [19] Comparison with Related Work - ExploreVLM addresses the limitations of existing methods through structured perception, dual-stage planning, and stepwise closure, enhancing task execution and adaptability [20]