Core Viewpoint - The article discusses the advancements in AI text-to-speech technology, particularly focusing on the release of ElevenLabs' latest TTS model, Eleven v3, which is claimed to be the most expressive text-to-speech model to date [3][5]. Group 1: Model Features - Eleven v3 supports over 70 languages, including Chinese, and can facilitate multi-person dialogue with vivid emotional and tonal expressions [2][5]. - The model introduces audio tags to control emotional expression, enhancing the realism of the generated speech [17][20]. - Users can select from 22 different voice profiles, primarily featuring American and British accents, suitable for various narration contexts [11][12]. Group 2: Emotional Control and Dialogue - Emotional expression can be controlled through specific audio tags, which include emotional, sound effect, and special tags [20][18]. - Proper punctuation usage is emphasized as it significantly impacts emotional delivery in the generated speech [22][21]. - The model allows for multi-person dialogue by assigning different voices from the voice library to each speaker [24][25]. Group 3: User Feedback and Performance - Early user feedback indicates that Eleven v3 has successfully improved emotional expression compared to its predecessor, v2 [28][27]. - Users have noted impressive emotional recognition capabilities, although some minor issues, such as the brevity of certain sound effects, were mentioned [30][29]. - Overall, the product is seen as maturing in emotional control, although the performance in Chinese still lags behind that in English [31].
AI文本转语音进入“Next Level”!独角兽ElevenLabs发布Eleven v3:狠狠拿捏情感控制
量子位·2025-06-06 13:45