音频分割
Search documents
分割一切、3D重建一切还不够,Meta开源SAM Audio分割一切声音
机器之心· 2025-12-17 09:42
Core Viewpoint - Meta has launched SAM Audio, an audio segmentation model that utilizes multimodal prompts to separate sounds from complex audio mixtures, revolutionizing audio processing [1][4]. Group 1: Technology and Functionality - SAM Audio is powered by the Perception Encoder Audiovisual (PE-AV), which enhances its performance in audio segmentation tasks [2][18]. - PE-AV builds on the Perception Encoder model released earlier this year, extending advanced computer vision capabilities to audio processing [3][20]. - The model supports various interaction methods, including text prompts, visual prompts, and a novel time span prompting technique, allowing for precise audio separation [9][16]. - SAM Audio can effectively operate in diverse real-world scenarios, providing users with intuitive control over the audio separation process [9][12]. Group 2: Applications and Use Cases - Meta envisions numerous applications for SAM Audio, including audio cleaning, background noise removal, and tools to enhance user creativity [5][42]. - Users can explore SAM Audio's capabilities through the Segment Anything Playground, where they can select or upload audio and video content [7][31]. Group 3: Evaluation and Benchmarking - SAM Audio-Bench is introduced as a comprehensive benchmark for audio separation, covering various audio domains and interaction types [29][30]. - SAM Audio Judge is a new evaluation framework that assesses audio segmentation quality based on human perception rather than traditional reference audio comparisons [26][27]. Group 4: Performance and Future Outlook - SAM Audio has achieved state-of-the-art performance across multiple benchmarks and tasks, outperforming previous models in audio separation [35][36]. - The model operates efficiently with a real-time factor of approximately 0.7, capable of handling large-scale audio processing [40]. - Meta aims to promote accessibility and creativity through SAM Audio, collaborating with partners to explore its potential in assistive technologies [42].