Workflow
机器人操作
icon
Search documents
学会see和act:机器人操作中的任务感知视角规划
具身智能之心· 2025-08-14 00:03
Research Background and Motivation - Existing visual-language-action (VLA) models in multi-task robotic operations rely on fixed viewpoints and shared visual encoders, limiting 3D perception and causing task interference, which affects robustness and generalization [2][3] - Fixed viewpoints are particularly problematic in complex scenes, where occlusion can lead to incomplete scene understanding and inaccurate action predictions [2] - The limitations of shared encoders are evident in tasks with significant visual and semantic differences, restricting model generalization and scalability [2] Core Method: TAVP Framework - The Task-Aware View Planning (TAVP) framework integrates active view planning with task-specific representation learning, featuring the TaskMoE module and MVEP strategy [3] TaskMoE: Task-Aware Mixture of Experts Module - Designed to enhance multi-task accuracy and generalization through two key innovations [5] MVEP: Multi-View Exploration Policy - Aims to select K viewpoints that maximize the capture of operation target-related information, improving action prediction accuracy [6] Training Strategy - The training process consists of three phases: 1. Phase 1: Train TAVP's fixed viewpoint variant using three default viewpoints [7] 2. Phase 2: Optimize MVEP based on the fixed viewpoint model using the PPO algorithm [8] 3. Phase 3: Fine-tune the entire TAVP model excluding MVEP, using the same loss functions as in Phase 1 [8] Key Results - TAVP outperforms fixed viewpoint dense models (RVT2, ARP, ARP+) in success rates across all tasks, with a 56% increase in challenging tasks and an average success rate improvement from 84.9% to 86.7% [13][14] Ablation Study - Removing TaskMoE results in a decrease in average success rate from 86.67% to 85.56%, highlighting its importance in multi-task representation learning [15][18] Sensitivity Analysis - Increasing the number of viewpoints (K) significantly improves success rates, especially in occlusion-prone tasks [16][17] Efficiency and Generalization Analysis - TAVP achieves a higher average success rate (86.67%) compared to ARP+ (84.90%), with a slight increase in inference delay of approximately 10.7% [20]
VLA之外,具身+VA工作汇总
具身智能之心· 2025-07-14 02:21
Core Insights - The article focuses on advancements in embodied intelligence and robotic manipulation, highlighting various research projects and methodologies aimed at improving robotic capabilities in real-world applications [2][3][4]. Group 1: 2025 Research Initiatives - Numerous projects are outlined for 2025, including "Steering Your Diffusion Policy with Latent Space Reinforcement Learning" and "Chain-of-Action: Trajectory Autoregressive Modeling for Robotic Manipulation," which aim to enhance robotic manipulation through advanced learning techniques [2][3]. - The "BEHAVIOR Robot Suite" is designed to streamline real-world whole-body manipulation for everyday household activities, indicating a focus on practical applications of robotics [2]. - "You Only Teach Once: Learn One-Shot Bimanual Robotic Manipulation from Video Demonstrations" emphasizes the potential for efficient learning methods in robotic training [2][3]. Group 2: Methodologies and Techniques - The article discusses various methodologies such as "Adaptive 3D Scene Representation for Domain Transfer in Imitation Learning" and "Learning the RoPEs: Better 2D and 3D Position Encodings with STRING," which aim to improve the adaptability and efficiency of robotic systems [2][3][4]. - "RoboGrasp: A Universal Grasping Policy for Robust Robotic Control" highlights the development of a versatile grasping policy that can be applied across different robotic platforms [2][3]. - "Learning Dexterous In-Hand Manipulation with Multifingered Hands via Visuomotor Diffusion" showcases advancements in fine motor skills for robots, crucial for complex tasks [4]. Group 3: Future Directions - The research emphasizes the importance of integrating visual and tactile feedback in robotic systems, as seen in projects like "Adaptive Visuo-Tactile Fusion with Predictive Force Attention for Dexterous Manipulation" [7]. - "Zero-Shot Visual Generalization in Robot Manipulation" indicates a trend towards developing robots that can generalize learned skills to new, unseen scenarios without additional training [7]. - The focus on "Human-to-Robot Data Augmentation for Robot Pre-training from Videos" suggests a shift towards leveraging human demonstrations to enhance robotic learning processes [7].