语音模型

Search documents
阿里发布通义百聆语音模型 模型下载量已超5.6亿
Zheng Quan Ri Bao· 2025-09-24 07:38
Core Insights - Alibaba Group has launched a new voice model family called Tongyi Bailing, which includes the Fun-ASR speech recognition model and the Fun-CosyVoice speech synthesis model [2] Group 1: Product Features - Fun-ASR is trained on tens of millions of hours of real speech data, demonstrating strong contextual understanding and industry adaptability, capable of real-time processing in over 10 languages [2] - Fun-CosyVoice offers hundreds of pre-set voice tones, suitable for various applications including customer service, sales, live e-commerce, consumer electronics, audiobooks, and children's entertainment [2] - The download count for the open-source Tongyi Bailing model has exceeded 560 million [2]
阿里发布通义百聆语音模型,模型下载量已超5.6亿
Xin Lang Ke Ji· 2025-09-24 05:07
Core Insights - Alibaba launched a new voice model family called Tongyi Bailin at the 2025 Hangzhou Yunqi Conference, which includes the Fun-ASR speech recognition model and the Fun-CosyVoice speech synthesis model [1] Group 1: Product Features - Fun-ASR is trained on tens of millions of hours of real speech data, demonstrating strong contextual understanding and industry adaptability, capable of real-time processing in over 10 languages [1] - Fun-CosyVoice offers hundreds of pre-set voice tones, suitable for various applications including customer service, sales, live e-commerce, consumer electronics, audiobooks, and children's entertainment [1] Group 2: Market Impact - The download volume of the open-source Tongyi Bailin model has exceeded 560 million [1]
OpenAI杀入语音模型大战,祭出最强GPT-RealTime,加量还降价
3 6 Ke· 2025-08-29 06:08
Core Insights - OpenAI has launched its most advanced voice synthesis model, GPT-RealTime, which improves on complex instruction adherence, tool invocation, and natural speech generation [1][8] - The new model features enhanced capabilities such as capturing non-verbal cues like laughter and seamless language switching, making it more expressive and human-like [1][8] Pricing and Accessibility - The GPT-RealTime model is available to developers at a cost of $32 per million tokens for audio input, $0.4 per million tokens for cached input, and $64 per million tokens for audio output, representing a 20% price reduction compared to the previous model [1] Developer Features - OpenAI has introduced fine-grained control over conversation context, allowing developers to set smart token limits and truncate multiple turns, significantly reducing costs for long conversations [2] - The Realtime API now supports remote MCP server integration, image input, and SIP for direct connections to public phone networks, enhancing the model's usability [16][17] Model Performance - GPT-RealTime has shown improved accuracy in understanding user instructions, achieving an 82.8% accuracy rate in the Big Bench Audio evaluation, surpassing previous models [8][10] - The model's function calling performance has also improved, scoring 66.5% in the ComplexFuncBench audio evaluation, indicating better execution of commands [13] Application Scenarios - OpenAI's new model has been integrated into various applications, including real estate assistance with Zillow, mobile assistance with T-Mobile, ticket purchasing with StubHub, and appointment scheduling with Oscar Health [4][5][6][7] - The model's ability to maintain natural conversation flow and handle complex requests positions it well for diverse AI agent applications across industries [18]