SAM 2
Search documents
ICLR 2026惊现SAM 3,分割一切的下一步:让模型理解「概念」
具身智能之心· 2025-10-14 00:02
Core Viewpoint - The article discusses the release of the paper "SAM 3: Segment Anything with Concepts" by Meta, which introduces advancements in the field of computer vision, particularly in promptable concept segmentation [3][5][9]. Summary by Sections Introduction - The paper "SAM 3" has gained significant attention, suggesting it is a continuation of Meta's "Segment Anything" series, following the previous versions SAM 1 and SAM 2 [3][5][6]. Key Developments - SAM 3 introduces a new task called Promptable Concept Segmentation (PCS), allowing users to input text or image examples to predict instance and semantic masks for matching objects while maintaining identity consistency across video frames [9][17]. - The focus is on identifying atomic visual concepts, enabling the model to understand simple noun phrases like "red apple" or "striped cat" for segmentation [9][12]. Performance Improvements - SAM 3 shows significant performance improvements over SAM 2, achieving at least a 2x enhancement on the new benchmark SA-Co, with a zero-shot mask average precision of 47.0 on the LVIS dataset, surpassing the previous best of 38.5 [13][14]. - The model processes images with over 100 objects in just 30 milliseconds on a single H200 GPU [14]. Methodology - SAM 3 is built on a dual encoder-decoder transformer architecture, integrating a detector with a tracker and memory module for video applications [19]. - A scalable human-machine collaborative data engine was developed, annotating a high-quality training dataset with 4 million unique phrases and 520 million masks [20]. Benchmarking and Results - SAM 3 outperforms previous models in various benchmarks, including achieving a CGF score that is double that of the strongest baseline OWLv2 on the open vocabulary SA-Co/Gold dataset [28]. - In multiple public benchmarks, SAM 3 consistently exceeds the performance of strong expert baselines, demonstrating its effectiveness in instance segmentation and object detection tasks [27][30]. Conclusion - The advancements in SAM 3 position it as a leading model in the field of computer vision, particularly in the area of promptable segmentation, showcasing Meta's commitment to pushing the boundaries of AI technology [9][12][19].
ICLR 2026惊现SAM 3,分割一切的下一步:让模型理解「概念」
机器之心· 2025-10-13 04:21
Core Insights - The article discusses the release of a new paper titled "SAM 3: Segment Anything with Concepts," which is believed to be a continuation of Meta's "Segment Anything" series, following SAM 1 and SAM 2 [1][3][4]. Group 1: Overview of SAM 3 - SAM 3 introduces a new task called Promptable Concept Segmentation (PCS), allowing users to input text or image examples to predict instance and semantic masks for matching objects while maintaining identity consistency across video frames [8][12]. - The model focuses on identifying atomic visual concepts, enabling it to understand simple noun phrases like "red apple" or "striped cat" for segmentation tasks [8][12]. - SAM 3 improves upon its predecessors by enhancing performance in promptable visual segmentation and establishing new standards for PCS [18]. Group 2: Performance Metrics - SAM 3 shows significant performance improvements, achieving at least a 2x enhancement on the newly proposed SA-Co benchmark compared to previous systems [13]. - In the LVIS dataset, SAM 3 achieved a zero-shot mask average precision of 47.0, surpassing the previous best of 38.5 [13]. - The model processes images with over 100 objects in just 30 milliseconds on a single H200 GPU [14]. Group 3: Methodology and Data - SAM 3 employs a dual encoder-decoder transformer architecture, integrating a detector with a tracker and memory module for video applications [20]. - The research developed a scalable human-machine collaborative data engine, annotating a high-quality training dataset with 4 million unique phrases and 520 million masks [21]. - The PCS benchmark includes 124K images and 1.7K videos with 214K unique concepts, significantly expanding the concept count compared to existing benchmarks [25]. Group 4: Comparative Analysis - SAM 3 outperforms previous models in various tasks, including instance segmentation, box detection, and semantic segmentation across multiple datasets [27][28]. - In open vocabulary semantic segmentation experiments, SAM 3 exceeded the performance of strong baseline models [29]. - The model also demonstrated superior object counting accuracy and segmentation capabilities compared to other models [33].