Core Insights - The upcoming upgrade of the SAM model, SAM 3, focuses on "concept-based segmentation," allowing for segmentation based on semantic concepts rather than just pixels or instances [6][8][15] - SAM 3 introduces a new standard called Promptable Concept Segmentation (PCS), enabling the model to identify and segment all objects that fit a given concept across various images and videos [8][12][16] - The model has been trained on a vast dataset, including approximately 4 million unique concept labels, enhancing its ability to understand and segment based on user prompts [6][11][27] Group 1: SAM 3 Features - SAM 3 emphasizes interactive refinement of segmentation results, allowing users to provide additional prompts to clarify ambiguous cases [8][11] - The model can track multiple instances of the same concept across different frames in a video, improving its utility in dynamic environments [8][12] - SAM 3 achieves significant performance improvements, with a zero-shot segmentation accuracy of 47.0 on the LVIS dataset, surpassing the previous best of 38.5 [11][28] Group 2: Data Engine and Training - A human-AI collaborative data engine has been developed to enhance the training process, allowing the model to learn from its mistakes and improve accuracy [19][22] - The data engine consists of four phases, starting with human validation and progressing to AI-assisted validation and video annotation [21][25] - The final dataset, SA-Co, includes 126,000 samples and 214,000 unique phrases, making it one of the largest open vocabulary segmentation datasets available [28] Group 3: Concept Segmentation Challenges - PCS faces challenges due to the vast range of possible concepts, leading to ambiguities that the model must navigate [14] - To address these ambiguities, SAM 3 employs multi-expert annotations and optimized evaluation protocols to ensure objectivity and accuracy [14][19] - The model includes a dedicated "ambiguity module" to help it understand and tolerate vague boundaries in concept definitions [14][19]
ICLR神秘论文曝光,SAM3用「概念」看世界,重构视觉AI新范式
