Core Insights - The article discusses the limitations of existing robotic manipulation models that primarily rely on 2D images, which often lose critical depth information and 3D geometric structure [2][4] - The proposed solution, SpatialActor, focuses on "disentanglement," separating semantic information from spatial geometric information to enhance robotic understanding and interaction with 3D environments [4][7] Methodology and Architecture - SpatialActor employs a dual-stream architecture that decouples visual and depth encoding, integrating a Semantic-Guided Geometry Module (SGM) and a Spatial Transformer (SPT) to improve robustness and accuracy in robotic tasks [10][11] - The SGM combines robust geometric priors from a pre-trained depth estimation model with fine-grained but noisy depth features, optimizing the geometric representation while maintaining alignment with semantic cues [11][13] - The SPT establishes precise 2D to 3D mappings and integrates multi-modal features, crucial for generating accurate robotic actions [13] Experimental Results - SpatialActor achieved an average success rate of 87.4% across various tasks in simulation, outperforming the previous state-of-the-art model RVT-2 by 6.0% [16][19] - In noise experiments, SpatialActor demonstrated superior robustness, with average success rates improving by 13.9%, 16.9%, and 19.4% under light, medium, and heavy noise conditions, respectively [19][20] - Real-world experiments showed SpatialActor consistently outperforming RVT-2 by approximately 20% across various tasks, confirming its effectiveness in complex environments [22][24] Conclusion - The article concludes that SpatialActor represents a significant advancement in robotic manipulation by effectively decoupling semantic and geometric information, leading to improved robustness and generalization in diverse conditions [24][25] - The framework highlights the importance of disentangled spatial representations for developing more resilient and adaptable robotic systems [25][26]
告别「2D错觉」,SpatialActor通过解耦语义与几何,为具身智能注入强鲁棒空间基因
机器之心·2025-12-05 03:02