AudioLBM
Search documents
清华大学x生数科技:从波形到隐空间,AudioLBM引领音频超分新范式
量子位· 2025-10-12 04:07
Core Insights - The article discusses advancements in Audio Super-Resolution (Audio SR), a technology crucial for enhancing audio clarity and detail, particularly in voice communication and music production. Recent developments include OpenAI's Sora 2 model, which generates audio at a sampling rate of up to 96 kHz, setting a new benchmark for high-fidelity audio generation [1][2]. Group 1: Bridge-SR Model - The Bridge-SR model, introduced in a paper published at ICASSP 2025, utilizes the Schrödinger Bridge model for audio super-resolution, establishing a bridge between low and high-resolution waveforms through a "data-to-data" generation paradigm. This model operates with a lightweight network of only 1.7 million parameters, achieving high-quality audio super-resolution [3][4][7]. - Bridge-SR demonstrates superior performance on the VCTK speech test set compared to several mainstream methods, providing a new approach for prior-driven audio super-resolution [4][8]. Group 2: AudioLBM Model - Building on Bridge-SR, the AudioLBM model, presented at NeurIPS 2025, transitions from waveform domain generation to latent space modeling, creating a bridge for low to high-resolution audio generation. It employs a variational autoencoder (VAE) to compress waveforms into a continuous latent space representation, enhancing the model's generalization capabilities [10][13]. - AudioLBM introduces a frequency-aware mechanism to improve training efficiency and supports an "any-to-any" super-resolution process, achieving breakthroughs in generating audio at 96 kHz and 192 kHz, thus making high-quality master audio more accessible [9][13][17]. Group 3: Performance Metrics - In comparative evaluations, AudioLBM significantly outperforms baseline models like AudioSR and FlowHigh in terms of log spectral distance (LSD), maintaining stable performance across 96 kHz and 192 kHz tasks. This model achieves high-fidelity reconstruction across various audio types, enhancing its versatility [17][19]. - The article includes detailed performance metrics showcasing AudioLBM's state-of-the-art results in audio super-resolution tasks, indicating its effectiveness in generating high-quality audio across different domains [15][16].