任务感知视图规划（TAVP） - filings, earnings calls, financial reports, news

任务感知视图规划（TAVP）

Search documents

具身智能之心· 2025-08-14 00:03

Research Background and Motivation - Existing visual-language-action (VLA) models in multi-task robotic operations rely on fixed viewpoints and shared visual encoders, limiting 3D perception and causing task interference, which affects robustness and generalization [2][3] - Fixed viewpoints are particularly problematic in complex scenes, where occlusion can lead to incomplete scene understanding and inaccurate action predictions [2] - The limitations of shared encoders are evident in tasks with significant visual and semantic differences, restricting model generalization and scalability [2] Core Method: TAVP Framework - The Task-Aware View Planning (TAVP) framework integrates active view planning with task-specific representation learning, featuring the TaskMoE module and MVEP strategy [3] TaskMoE: Task-Aware Mixture of Experts Module - Designed to enhance multi-task accuracy and generalization through two key innovations [5] MVEP: Multi-View Exploration Policy - Aims to select K viewpoints that maximize the capture of operation target-related information, improving action prediction accuracy [6] Training Strategy - The training process consists of three phases: 1. Phase 1: Train TAVP's fixed viewpoint variant using three default viewpoints [7] 2. Phase 2: Optimize MVEP based on the fixed viewpoint model using the PPO algorithm [8] 3. Phase 3: Fine-tune the entire TAVP model excluding MVEP, using the same loss functions as in Phase 1 [8] Key Results - TAVP outperforms fixed viewpoint dense models (RVT2, ARP, ARP+) in success rates across all tasks, with a 56% increase in challenging tasks and an average success rate improvement from 84.9% to 86.7% [13][14] Ablation Study - Removing TaskMoE results in a decrease in average success rate from 86.67% to 85.56%, highlighting its importance in multi-task representation learning [15][18] Sensitivity Analysis - Increasing the number of viewpoints (K) significantly improves success rates, especially in occlusion-prone tasks [16][17] Efficiency and Generalization Analysis - TAVP achieves a higher average success rate (86.67%) compared to ARP+ (84.90%), with a slight increase in inference delay of approximately 10.7% [20]