用2D数据解锁3D世界：首个面向运动学部件分解的多视角视频扩散框架

Core Viewpoint - The article discusses the development of Stable Part Diffusion 4D (SP4D), a framework designed to generate multi-view RGB and kinematic parts from monocular video, addressing the limitations of existing methods in 3D content creation and animation [4][16]. Research Background and Motivation - The research is motivated by the need for effective rigging and part decomposition in character animation and 3D content production, highlighting the limitations of current methods that rely heavily on 3D data [4][3]. Research Method and Innovations - SP4D introduces a novel multi-view video diffusion framework aimed at kinematic part decomposition, featuring innovations such as automatic rigging and part decomposition that leverage large-scale 2D data and pre-trained diffusion models [7][8]. Experimental Results - SP4D demonstrated significant improvements over existing methods on the KinematicParts20K validation set, achieving a mean Intersection over Union (mIoU) of 0.68, compared to 0.15 for SAM2 and 0.17 for DeepViT [11]. - The framework also achieved an Adjusted Rand Index (ARI) of 0.60, significantly higher than SAM2's 0.05, indicating better structural consistency [11]. - User studies rated SP4D an average of 4.26/5 on metrics such as part clarity and animation adaptability, outperforming SAM2 (1.96) and DeepViT (1.85) [11]. Automatic Rigging Performance - In automatic rigging tasks, SP4D achieved a Rigging Precision of 72.7, surpassing Magic Articulate (63.7) and UniRig (64.3) [14]. - User evaluations indicated an average score of 4.1/5 for animation naturalness, significantly higher than Magic Articulate (2.7) and UniRig (2.3), showcasing better generalization for unseen categories and complex shapes [14]. Conclusion - SP4D represents a technological breakthrough and a result of interdisciplinary collaboration, accepted as a Spotlight at Neurips 2025, paving the way for automation and intelligence in animation, gaming, AR/VR, and robotic simulation [16].