Segment Anything Playground
Search documents
分割一切并不够,还要3D重建一切,SAM 3D来了
具身智能之心· 2025-11-21 00:04
Core Viewpoint - Meta has launched significant updates with the introduction of SAM 3D and SAM 3, enhancing the understanding of images in 3D and providing advanced capabilities for object detection, segmentation, and tracking in images and videos [2][6][40]. Group 1: SAM 3D Overview - SAM 3D is the latest addition to the SAM series, featuring two models: SAM 3D Objects and SAM 3D Body, both demonstrating state-of-the-art performance in converting 2D images into detailed 3D reconstructions [2][4]. - SAM 3D Objects allows users to generate 3D models from a single image, overcoming limitations of traditional 3D modeling that often relies on isolated or synthetic data [11][15]. - Meta has annotated nearly 1 million real-world images, generating approximately 3.14 million 3D meshes, utilizing a scalable data engine to enhance the quality and quantity of 3D data [20][26]. Group 2: SAM 3D Body - SAM 3D Body focuses on accurate 3D human pose and shape reconstruction from single images, maintaining high-quality performance even in complex scenarios with occlusions and unusual poses [28][30]. - The model is interactive, allowing users to guide and control predictions, enhancing accuracy and usability [29]. - A high-quality training dataset of around 8 million images was created to improve the model's performance across various 3D benchmarks [33]. Group 3: SAM 3 Capabilities - SAM 3 introduces promptable concept segmentation, enabling the model to detect and segment specific concepts based on text or example image prompts, significantly improving its performance in concept recognition [40][42]. - The architecture of SAM 3 builds on previous advancements, utilizing components like the Meta Perception Encoder and DETR for enhanced image recognition and object detection capabilities [42][44]. - SAM 3 achieves a twofold increase in cgF1 scores for concept recognition and maintains near real-time performance for images with over 100 detection targets, completing inference in approximately 30 milliseconds on H200 GPUs [44].
分割一切并不够,还要3D重建一切,SAM 3D来了
机器之心· 2025-11-20 02:07
Core Insights - Meta has launched significant updates with the introduction of SAM 3D and SAM 3, enhancing the understanding of images in 3D [1][2] Group 1: SAM 3D Overview - SAM 3D is the latest addition to the SAM series, featuring two models that convert static 2D images into detailed 3D reconstructions [2][5] - SAM 3D Objects focuses on object and scene reconstruction, while SAM 3D Body specializes in human shape and pose estimation [5][28] - Meta has made the model weights and inference code for SAM 3D and SAM 3 publicly available [7] Group 2: SAM 3D Objects - SAM 3D Objects introduces a novel technical approach for robust and realistic 3D reconstruction and object pose estimation from a single natural image [11] - The model can generate detailed 3D shapes, textures, and scene layouts from everyday photos, overcoming challenges like small objects and occlusions [12][13] - Meta has annotated nearly 1 million images, generating approximately 3.14 million 3D meshes, leveraging a scalable data engine for efficient data collection [17][22] Group 3: SAM 3D Body - SAM 3D Body addresses the challenge of accurate human 3D pose and shape reconstruction from a single image, even in complex scenarios [28] - The model supports interactive input, allowing users to guide and control predictions for improved accuracy [29] - A high-quality training dataset of around 8 million images was created to enhance the model's performance across various 3D benchmarks [31] Group 4: SAM 3 Capabilities - SAM 3 introduces promptable concept segmentation, enabling the model to identify and segment instances of specific concepts based on text or example images [35] - The architecture of SAM 3 builds on previous AI advancements, utilizing Meta Perception Encoder for enhanced image recognition and object detection [37] - SAM 3 has achieved a twofold improvement in concept segmentation performance compared to existing models, with rapid inference times even for images with numerous detection targets [39]