Omnilingual ASR系统
Search documents
翻译界的ChatGPT时刻,Meta发布新模型,几段示例学会冷门新语言
3 6 Ke· 2025-11-11 12:12
Core Insights - Meta has launched the Omnilingual ASR system, capable of recognizing over 1,600 languages, aiming to bridge the digital divide in language technology [1][2][5] - The system supports 500 languages that have never been transcribed by any AI system before, significantly expanding the language coverage compared to existing models like OpenAI's Whisper [5][7] - Omnilingual ASR introduces a flexible learning mechanism that allows the model to learn new languages from minimal examples, potentially expanding its coverage to over 5,400 languages [10][11] Language Coverage and Performance - The system achieves a low character error rate (CER) of less than 10% for 78% of the tested languages, and this rate increases to 95% for languages with over 10 hours of training data [7][8] - It categorizes languages into high-resource, medium-resource, and low-resource, with 95% of high and medium-resource languages achieving a CER below 10% [8] - Even low-resource languages show promise, with 36% achieving a CER below 10% [8] Open Source and Community Engagement - Omnilingual ASR is fully open-sourced on GitHub, allowing researchers, developers, and organizations to use and modify the model without licensing restrictions [11][13] - Meta has released a large multilingual speech dataset, including transcriptions for 350 underrepresented languages, to support community-driven language recognition [13][14] - The development involved collaboration with local language organizations to gather diverse voice samples, ensuring cultural sensitivity and community involvement [15][16] Technical Specifications - The model architecture includes a range of sizes, from lightweight models suitable for mobile devices to larger models with up to 7 billion parameters for high accuracy [16][18] - Training utilized over 4.3 million hours of audio data across 1,239 languages, marking it as one of the largest and most diverse speech training datasets ever [18] - The system's design allows for continuous growth and adaptation, enabling it to evolve alongside the diversity of human languages [18]