Workflow
文本驱动的可移动物体交互生成
icon
Search documents
ACMMM 2025 | 北大团队提出 InteractMove:3D场景中人与可移动物体交互动作生成新框架
机器之心· 2025-10-19 03:48
Core Insights - The article introduces the research paper titled "InteractMove: Text-Controlled Human-Object Interaction Generation in 3D Scenes with Movable Objects," which presents a novel task of generating human-object interactions in 3D scenes based on text descriptions, specifically focusing on movable objects [3][7][35] - The research team has developed a large-scale dataset and an innovative framework that outperforms existing methods in various evaluation metrics, addressing the limitations of current human-scene interaction datasets that primarily focus on static objects [3][4][35] Dataset Highlights - The InteractMove dataset includes multiple interactive objects and interference items, requiring the model to understand language and spatial reasoning to select the correct object [11] - It covers 71 types of movable objects and 21 interaction methods, ensuring a diverse range of interactions from simple to complex [11] - The dataset ensures physical realism by rigorously filtering actions and trajectories to avoid unrealistic phenomena like "penetration" [11][12] Methodology Overview - The proposed framework consists of three core modules: 3D visual localization, hand-object reachability graph learning, and collision-aware action generation [20][21][22] - The first step involves accurately locating target objects in complex scenes based on text input [20] - The second step models the fine-grained contact relationships between hand joints and object surfaces, allowing for diverse interaction strategies [21] - The final step ensures that generated actions adhere to physical laws, preventing collisions and ensuring natural interactions [22][23] Experimental Results - The method demonstrates superior performance across all key metrics, including interaction accuracy, physical realism, diversity, and collision avoidance, with a 18% improvement in diversity and a 14% improvement in physical realism compared to the best existing results [24][25] - Ablation studies confirm the effectiveness and necessity of each module in the proposed framework [28][29] Qualitative Analysis - The visual results indicate that InteractMove generates semantically coherent, natural, and physically realistic human-object interactions, showcasing smooth action transitions and appropriate hand-object contact [31][32][33] - The generated actions align closely with human-like behavior, avoiding unrealistic poses and ensuring that object movements are coordinated with human actions [32][33] Conclusion - The InteractMove project establishes a new framework for text-driven human-object interaction generation, overcoming the limitations of static object interactions and laying a solid foundation for applications in virtual reality, augmented reality, digital humans, and robotics [35]