可提示概念分割
Search documents
分割一切并不够,还要3D重建一切,SAM 3D来了
具身智能之心· 2025-11-21 00:04
Core Viewpoint - Meta has launched significant updates with the introduction of SAM 3D and SAM 3, enhancing the understanding of images in 3D and providing advanced capabilities for object detection, segmentation, and tracking in images and videos [2][6][40]. Group 1: SAM 3D Overview - SAM 3D is the latest addition to the SAM series, featuring two models: SAM 3D Objects and SAM 3D Body, both demonstrating state-of-the-art performance in converting 2D images into detailed 3D reconstructions [2][4]. - SAM 3D Objects allows users to generate 3D models from a single image, overcoming limitations of traditional 3D modeling that often relies on isolated or synthetic data [11][15]. - Meta has annotated nearly 1 million real-world images, generating approximately 3.14 million 3D meshes, utilizing a scalable data engine to enhance the quality and quantity of 3D data [20][26]. Group 2: SAM 3D Body - SAM 3D Body focuses on accurate 3D human pose and shape reconstruction from single images, maintaining high-quality performance even in complex scenarios with occlusions and unusual poses [28][30]. - The model is interactive, allowing users to guide and control predictions, enhancing accuracy and usability [29]. - A high-quality training dataset of around 8 million images was created to improve the model's performance across various 3D benchmarks [33]. Group 3: SAM 3 Capabilities - SAM 3 introduces promptable concept segmentation, enabling the model to detect and segment specific concepts based on text or example image prompts, significantly improving its performance in concept recognition [40][42]. - The architecture of SAM 3 builds on previous advancements, utilizing components like the Meta Perception Encoder and DETR for enhanced image recognition and object detection capabilities [42][44]. - SAM 3 achieves a twofold increase in cgF1 scores for concept recognition and maintains near real-time performance for images with over 100 detection targets, completing inference in approximately 30 milliseconds on H200 GPUs [44].
ICLR 2026惊现SAM 3,分割一切的下一步:让模型理解「概念」
具身智能之心· 2025-10-14 00:02
Core Viewpoint - The article discusses the release of the paper "SAM 3: Segment Anything with Concepts" by Meta, which introduces advancements in the field of computer vision, particularly in promptable concept segmentation [3][5][9]. Summary by Sections Introduction - The paper "SAM 3" has gained significant attention, suggesting it is a continuation of Meta's "Segment Anything" series, following the previous versions SAM 1 and SAM 2 [3][5][6]. Key Developments - SAM 3 introduces a new task called Promptable Concept Segmentation (PCS), allowing users to input text or image examples to predict instance and semantic masks for matching objects while maintaining identity consistency across video frames [9][17]. - The focus is on identifying atomic visual concepts, enabling the model to understand simple noun phrases like "red apple" or "striped cat" for segmentation [9][12]. Performance Improvements - SAM 3 shows significant performance improvements over SAM 2, achieving at least a 2x enhancement on the new benchmark SA-Co, with a zero-shot mask average precision of 47.0 on the LVIS dataset, surpassing the previous best of 38.5 [13][14]. - The model processes images with over 100 objects in just 30 milliseconds on a single H200 GPU [14]. Methodology - SAM 3 is built on a dual encoder-decoder transformer architecture, integrating a detector with a tracker and memory module for video applications [19]. - A scalable human-machine collaborative data engine was developed, annotating a high-quality training dataset with 4 million unique phrases and 520 million masks [20]. Benchmarking and Results - SAM 3 outperforms previous models in various benchmarks, including achieving a CGF score that is double that of the strongest baseline OWLv2 on the open vocabulary SA-Co/Gold dataset [28]. - In multiple public benchmarks, SAM 3 consistently exceeds the performance of strong expert baselines, demonstrating its effectiveness in instance segmentation and object detection tasks [27][30]. Conclusion - The advancements in SAM 3 position it as a leading model in the field of computer vision, particularly in the area of promptable segmentation, showcasing Meta's commitment to pushing the boundaries of AI technology [9][12][19].
ICLR 2026惊现SAM 3,分割一切的下一步:让模型理解「概念」
机器之心· 2025-10-13 04:21
Core Insights - The article discusses the release of a new paper titled "SAM 3: Segment Anything with Concepts," which is believed to be a continuation of Meta's "Segment Anything" series, following SAM 1 and SAM 2 [1][3][4]. Group 1: Overview of SAM 3 - SAM 3 introduces a new task called Promptable Concept Segmentation (PCS), allowing users to input text or image examples to predict instance and semantic masks for matching objects while maintaining identity consistency across video frames [8][12]. - The model focuses on identifying atomic visual concepts, enabling it to understand simple noun phrases like "red apple" or "striped cat" for segmentation tasks [8][12]. - SAM 3 improves upon its predecessors by enhancing performance in promptable visual segmentation and establishing new standards for PCS [18]. Group 2: Performance Metrics - SAM 3 shows significant performance improvements, achieving at least a 2x enhancement on the newly proposed SA-Co benchmark compared to previous systems [13]. - In the LVIS dataset, SAM 3 achieved a zero-shot mask average precision of 47.0, surpassing the previous best of 38.5 [13]. - The model processes images with over 100 objects in just 30 milliseconds on a single H200 GPU [14]. Group 3: Methodology and Data - SAM 3 employs a dual encoder-decoder transformer architecture, integrating a detector with a tracker and memory module for video applications [20]. - The research developed a scalable human-machine collaborative data engine, annotating a high-quality training dataset with 4 million unique phrases and 520 million masks [21]. - The PCS benchmark includes 124K images and 1.7K videos with 214K unique concepts, significantly expanding the concept count compared to existing benchmarks [25]. Group 4: Comparative Analysis - SAM 3 outperforms previous models in various tasks, including instance segmentation, box detection, and semantic segmentation across multiple datasets [27][28]. - In open vocabulary semantic segmentation experiments, SAM 3 exceeded the performance of strong baseline models [29]. - The model also demonstrated superior object counting accuracy and segmentation capabilities compared to other models [33].