语义与几何解耦
Search documents
SpatialActor:解耦语义与几何,为具身智能注入强鲁棒空间基因
具身智能之心· 2025-12-05 16:02
Core Insights - The article discusses the development of SpatialActor, a robust spatial representation framework for robotic manipulation, which addresses challenges related to precise spatial understanding, sensor noise, and effective interaction [21][24] - SpatialActor separates semantic information from geometric information, enhancing the robot's ability to understand tasks and accurately perceive its environment [21][6] Methodology and Architecture - SpatialActor employs a "dual-stream disentanglement and fusion" architecture, integrating semantic understanding from visual language models (VLM) and precise geometric control from 3D representations [6][21] - The architecture includes independent visual and depth encoders, with a Semantic-Guided Geometry Module (SGM) that adaptively fuses robust geometric priors with fine-grained depth features [9][10] - A Spatial Transformer (SPT) establishes accurate 2D to 3D mappings and integrates multi-modal features, crucial for generating precise actions [12][9] Performance Evaluation - In simulations, SpatialActor achieved an average success rate of 87.4%, outperforming the previous state-of-the-art model RVT-2 by 6.0% [13][19] - The model demonstrated significant robustness against noise, with performance improvements of 13.9% to 19.4% across different noise levels compared to RVT-2 [14][19] - Real-world experiments showed SpatialActor consistently outperforming RVT-2 by approximately 20% across various tasks, confirming its effectiveness in complex environments [19][18] Conclusion - The results highlight the importance of disentangled spatial representations in developing more robust and generalizable robotic systems, with SpatialActor showing superior performance in diverse conditions [21][20]