Workflow
速递|ElevenLabs发布独立语音检测模型,旨在精细化理解和转录语音
Z Potentials·2025-02-27 04:09

Core Viewpoint - ElevenLabs has raised $180 million in funding, primarily known for its audio generation capabilities, and is now entering the speech detection market to compete with other players like Gladia, Speechmatics, AssemblyAI, Deepgram, and OpenAI's Whisper model [1][2]. Group 1: Company Overview - ElevenLabs is valued at $3.3 billion and has a large voice library that supports various enterprises in providing speech-to-text services [1]. - The company has launched its first independent speech-to-text model, Scribe, which supports over 99 languages, with more than 25 categorized as having "excellent accuracy" (word error rate below 5%) [1]. Group 2: Model Performance - In benchmark tests, Scribe outperformed Google Gemini 2.0 Flash and Whisper Large V3 across multiple languages [2]. - The model features intelligent speaker separation, providing word-level timestamps for accuracy and automatically marking sound events like audience laughter [3]. Group 3: Pricing and Availability - Scribe is currently priced at $0.40 per hour for transcribing audio, which is competitive, although some competitors offer lower prices with different functionalities [3]. - The model currently supports only pre-recorded audio formats, with plans to release a low-latency real-time version soon [3].