深度强化学习(DRL)

Search documents
X-Nav:端到端跨平台导航框架,通用策略实现零样本迁移
具身智能之心· 2025-07-22 06:29
Core Viewpoint - The article presents the X-Nav framework, which enables end-to-end cross-embodiment navigation for mobile robots, allowing a single universal strategy to be deployed across different robot forms, including wheeled and quadrupedal robots [3][4]. Group 1: Existing Limitations - Current navigation methods are often designed for specific robot forms, limiting their generalizability across platforms [4]. - Navigation tasks require robots to move without collisions in complex environments, relying on visual observations, target positions, and proprioceptive information, but existing methods face significant limitations [4]. Group 2: X-Nav Architecture - The X-Nav architecture consists of two core phases: expert policy learning and universal policy refinement [5][8]. - Phase 1 involves training multiple expert policies using deep reinforcement learning (DRL) on randomly generated robot forms [6]. - Phase 2 refines these expert policies into a single universal policy using a Nav-ACT transformer model [8]. Group 3: Training and Evaluation - The training process utilizes the Proximal Policy Optimization (PPO) algorithm, with a reward function that includes task rewards and regularization rewards, tailored for wheeled and quadrupedal robots [10][16]. - Experimental validation shows that X-Nav outperforms other methods in success rate (SR) and success rate weighted path length (SPL), with Jackal achieving an SR of 90.4% and SPL of 0.84 [13]. - Scalability studies indicate that increasing the number of training forms significantly enhances the adaptability to unknown robots [14]. Group 4: Ablation Studies - Ablation studies validate the effectiveness of design choices, showing that using L1 loss instead of MSE reduces performance due to insufficient penalty for large errors [21]. - The execution of complete action blocks delays quadrupedal adaptation to dynamic changes, while omitting time integration (TE) leads to rough actions in wheeled robots [21]. Group 5: Real-World Testing - Real-world tests in indoor and outdoor environments demonstrate a success rate of 85% and SPL of 0.79, confirming the generalizability of the X-Nav framework across different sensor configurations [22].