Workflow
SpatialActor
icon
Search documents
SpatialActor:解耦语义与几何,为具身智能注入强鲁棒空间基因
具身智能之心· 2025-12-05 16:02
Core Insights - The article discusses the development of SpatialActor, a robust spatial representation framework for robotic manipulation, which addresses challenges related to precise spatial understanding, sensor noise, and effective interaction [21][24] - SpatialActor separates semantic information from geometric information, enhancing the robot's ability to understand tasks and accurately perceive its environment [21][6] Methodology and Architecture - SpatialActor employs a "dual-stream disentanglement and fusion" architecture, integrating semantic understanding from visual language models (VLM) and precise geometric control from 3D representations [6][21] - The architecture includes independent visual and depth encoders, with a Semantic-Guided Geometry Module (SGM) that adaptively fuses robust geometric priors with fine-grained depth features [9][10] - A Spatial Transformer (SPT) establishes accurate 2D to 3D mappings and integrates multi-modal features, crucial for generating precise actions [12][9] Performance Evaluation - In simulations, SpatialActor achieved an average success rate of 87.4%, outperforming the previous state-of-the-art model RVT-2 by 6.0% [13][19] - The model demonstrated significant robustness against noise, with performance improvements of 13.9% to 19.4% across different noise levels compared to RVT-2 [14][19] - Real-world experiments showed SpatialActor consistently outperforming RVT-2 by approximately 20% across various tasks, confirming its effectiveness in complex environments [19][18] Conclusion - The results highlight the importance of disentangled spatial representations in developing more robust and generalizable robotic systems, with SpatialActor showing superior performance in diverse conditions [21][20]
告别「2D错觉」,SpatialActor通过解耦语义与几何,为具身智能注入强鲁棒空间基因
机器之心· 2025-12-05 03:02
Core Insights - The article discusses the limitations of existing robotic manipulation models that primarily rely on 2D images, which often lose critical depth information and 3D geometric structure [2][4] - The proposed solution, SpatialActor, focuses on "disentanglement," separating semantic information from spatial geometric information to enhance robotic understanding and interaction with 3D environments [4][7] Methodology and Architecture - SpatialActor employs a dual-stream architecture that decouples visual and depth encoding, integrating a Semantic-Guided Geometry Module (SGM) and a Spatial Transformer (SPT) to improve robustness and accuracy in robotic tasks [10][11] - The SGM combines robust geometric priors from a pre-trained depth estimation model with fine-grained but noisy depth features, optimizing the geometric representation while maintaining alignment with semantic cues [11][13] - The SPT establishes precise 2D to 3D mappings and integrates multi-modal features, crucial for generating accurate robotic actions [13] Experimental Results - SpatialActor achieved an average success rate of 87.4% across various tasks in simulation, outperforming the previous state-of-the-art model RVT-2 by 6.0% [16][19] - In noise experiments, SpatialActor demonstrated superior robustness, with average success rates improving by 13.9%, 16.9%, and 19.4% under light, medium, and heavy noise conditions, respectively [19][20] - Real-world experiments showed SpatialActor consistently outperforming RVT-2 by approximately 20% across various tasks, confirming its effectiveness in complex environments [22][24] Conclusion - The article concludes that SpatialActor represents a significant advancement in robotic manipulation by effectively decoupling semantic and geometric information, leading to improved robustness and generalization in diverse conditions [24][25] - The framework highlights the importance of disentangled spatial representations for developing more resilient and adaptable robotic systems [25][26]