Workflow
OpenAI推出gpt-realtime:语音智能体进入“秒回”时代,开发者直呼交互更自然
3 6 Ke·2025-09-16 10:42

Core Insights - OpenAI has officially launched gpt-realtime, a voice-to-voice model that represents the latest advancements in their research, aimed at reducing latency and enhancing voice quality while providing developers with robust tools for production-ready AI voice agents [1][2] Group 1: Model Enhancements - gpt-realtime has shown significant improvements in understanding capabilities, achieving an accuracy of 82.8% on the Big Bench Audio benchmark, up from 65.6% in the previous model [2] - The model can now recognize non-verbal signals, switch between multiple languages in a single sentence, and accurately process alphanumeric sequences across languages [2] - Instruction-following performance has improved, with MultiChallenge audio benchmark scores rising from 20.6% to 30.5% [2] - Function calling capabilities have been enhanced, with accuracy on ComplexFuncBench increasing from 49.7% to 66.5%, and the introduction of asynchronous function calling [2] Group 2: API Upgrades - The Realtime API has been fully upgraded to meet production-level requirements, allowing developers to connect remote MCP servers directly to conversations [3] - The API now supports image input, enabling applications to engage in dialogue based on visual content [3] - SIP support allows seamless integration of voice agents with existing telephone systems, including PBX and desktop phones [3] - Early enterprise partners like Zillow and T-Mobile are testing these features in near-production environments, emphasizing a shift from scripted automation to more flexible, domain-specific interactions [3] Group 3: Security and Accessibility - OpenAI has strengthened deployment security measures, incorporating classifiers to halt harmful conversations and allowing developers to add specific domain safety constraints through the Agents SDK [3] - The Realtime API's preset voices help reduce impersonation risks [3] - gpt-realtime and the Realtime API are now fully accessible to all developers, with documentation and a demo version available for quick onboarding [4]