Workflow
R2R数据集
icon
Search documents
VLN-PE:一个具备物理真实性的VLN平台,同时支持人形、四足和轮式机器人(ICCV'25)
具身智能之心· 2025-07-21 08:42
Core Insights - The article introduces VLN-PE, a physically realistic platform for Vision-Language Navigation (VLN), addressing the gap between simulated models and real-world deployment challenges [3][10][15] - The study highlights the significant performance drop (34%) when transferring existing VLN models from simulation to physical environments, emphasizing the need for improved adaptability [15][30] - The research identifies the impact of various factors such as robot type, environmental conditions, and the use of physical controllers on model performance [15][32][38] Background - VLN has emerged as a critical task in embodied AI, requiring agents to navigate complex environments based on natural language instructions [6][8] - Previous models relied on idealized simulations, which do not account for the physical constraints and challenges faced by real robots [9][10] VLN-PE Platform - VLN-PE is built on GRUTopia, supporting various robot types and integrating high-quality synthetic and 3D rendered environments for comprehensive evaluation [10][13] - The platform allows for seamless integration of new scenes, enhancing the scope of VLN research and assessment [10][14] Experimental Findings - The experiments reveal that existing models show a 34% decrease in success rates when transitioning from simulated to physical environments, indicating a significant gap in performance [15][30] - The study emphasizes the importance of multi-modal robustness, with RGB-D models performing better under low-light conditions compared to RGB-only models [15][38] - The findings suggest that training on diverse datasets can improve the generalization capabilities of VLN models across different environments [29][39] Methodologies - The article evaluates various methodologies, including single-step discrete action classification models and multi-step continuous prediction methods, highlighting the potential of diffusion strategies in VLN [20][21] - The research also explores the effectiveness of map-based zero-shot large language models (LLMs) for navigation tasks, demonstrating their potential in VLN applications [24][25] Performance Metrics - The study employs standard VLN evaluation metrics, including trajectory length, navigation error, success rate, and others, to assess model performance [18][19] - Additional metrics are introduced to account for physical realism, such as fall rate and stuck rate, which are critical for evaluating robot performance in real-world scenarios [18][19] Cross-Embodiment Training - The research indicates that cross-embodiment training can enhance model performance, allowing a unified model to generalize across different robot types [36][39] - The findings suggest that using data from multiple robot types during training leads to improved adaptability and performance in various environments [36][39]