语音转文字模型
Search documents
硅谷热议:最快语音转文字模型
量子位· 2025-11-12 08:01
Core Insights - The article discusses the launch of Scribe v2 Realtime, a cutting-edge speech-to-text model by ElevenLabs, which has garnered significant attention in Silicon Valley for its impressive performance metrics [3][4][16]. - The model boasts a latency of just 150 milliseconds and an accuracy rate of 93.5%, supporting over 90 languages, marking a significant advancement in the field of real-time speech transcription [4][10][15]. Company Overview - ElevenLabs, founded in 2022, focuses on AI voice technology and has quickly established itself in the industry, achieving over $200 million in revenue within 20 months of operation [18][21]. - The company’s founding team includes former Google machine learning engineers and Palantir strategists, emphasizing a strong technical background [19][23]. - ElevenLabs operates with a unique team structure, consisting of small, agile teams without formal titles, allowing for efficient decision-making and operations [23]. Product Features - Scribe v2 Realtime is designed to handle various audio formats and includes features like voice activity detection and customizable audio stream processing, enhancing its usability for diverse applications [10][12]. - The model has shown remarkable adaptability, accurately transcribing speech even in noisy environments and with complex terminology, which is a significant improvement over previous models [9][13]. Industry Context - The real-time speech-to-text sector has evolved through multiple technological iterations, with earlier models struggling with accuracy and latency issues, often exceeding 30% error rates in noisy conditions [13][14]. - The introduction of the Transformer architecture has alleviated the long-standing trade-off between speed and accuracy, enabling models like Scribe v2 Realtime to achieve both high accuracy and low latency [14][15].