ICLR 2026惊现SAM 3,分割一切的下一步:让模型理解「概念」

Core Viewpoint - The article discusses the release of the paper "SAM 3: Segment Anything with Concepts" by Meta, which introduces advancements in the field of computer vision, particularly in promptable concept segmentation [3][5][9]. Summary by Sections Introduction - The paper "SAM 3" has gained significant attention, suggesting it is a continuation of Meta's "Segment Anything" series, following the previous versions SAM 1 and SAM 2 [3][5][6]. Key Developments - SAM 3 introduces a new task called Promptable Concept Segmentation (PCS), allowing users to input text or image examples to predict instance and semantic masks for matching objects while maintaining identity consistency across video frames [9][17]. - The focus is on identifying atomic visual concepts, enabling the model to understand simple noun phrases like "red apple" or "striped cat" for segmentation [9][12]. Performance Improvements - SAM 3 shows significant performance improvements over SAM 2, achieving at least a 2x enhancement on the new benchmark SA-Co, with a zero-shot mask average precision of 47.0 on the LVIS dataset, surpassing the previous best of 38.5 [13][14]. - The model processes images with over 100 objects in just 30 milliseconds on a single H200 GPU [14]. Methodology - SAM 3 is built on a dual encoder-decoder transformer architecture, integrating a detector with a tracker and memory module for video applications [19]. - A scalable human-machine collaborative data engine was developed, annotating a high-quality training dataset with 4 million unique phrases and 520 million masks [20]. Benchmarking and Results - SAM 3 outperforms previous models in various benchmarks, including achieving a CGF score that is double that of the strongest baseline OWLv2 on the open vocabulary SA-Co/Gold dataset [28]. - In multiple public benchmarks, SAM 3 consistently exceeds the performance of strong expert baselines, demonstrating its effectiveness in instance segmentation and object detection tasks [27][30]. Conclusion - The advancements in SAM 3 position it as a leading model in the field of computer vision, particularly in the area of promptable segmentation, showcasing Meta's commitment to pushing the boundaries of AI technology [9][12][19].