Workflow
Realtime API
icon
Search documents
Bandwidth Expands 'Bring Your Own AI' Approach With Support for OpenAI's Realtime API To Power Advanced AI Voice Agents Using GPT Language Models
Prnewswire· 2025-09-24 12:12
Core Insights - Bandwidth Inc. has announced support for OpenAI's Realtime API, integrating voice calling with session initiation protocol (SIP), allowing enterprises to deploy conversational AI voice agents powered by OpenAI's GPT language models [1][2][3] Company Strategy - The integration of the Realtime API is part of Bandwidth's open and flexible strategy, enabling enterprise IT teams to build AI applications into their communications stack through various pathways [3] - Bandwidth offers pre-built integrations with leading conversational AI platforms such as Google Dialogflow and Cognigy, as well as support for third-party applications and standards-based APIs [3][4] Competitive Advantage - Bandwidth's owned-and-operated cloud communications network provides global reach, allowing customers to run conversational AI applications with high fidelity, low latency, and intelligent routing essential for mission-critical voice calls [3][4] - The company positions itself as a leader in AI orchestration for global enterprises by providing multiple integration options for conversational AI [4] Market Presence - Bandwidth operates in over 65 countries and covers more than 90 percent of global GDP, serving major players in unified communications and cloud contact centers, including AWS, Cisco, Google, and Microsoft [6]
OpenAI推出gpt-realtime:语音智能体进入“秒回”时代,开发者直呼交互更自然
3 6 Ke· 2025-09-16 10:42
Core Insights - OpenAI has officially launched gpt-realtime, a voice-to-voice model that represents the latest advancements in their research, aimed at reducing latency and enhancing voice quality while providing developers with robust tools for production-ready AI voice agents [1][2] Group 1: Model Enhancements - gpt-realtime has shown significant improvements in understanding capabilities, achieving an accuracy of 82.8% on the Big Bench Audio benchmark, up from 65.6% in the previous model [2] - The model can now recognize non-verbal signals, switch between multiple languages in a single sentence, and accurately process alphanumeric sequences across languages [2] - Instruction-following performance has improved, with MultiChallenge audio benchmark scores rising from 20.6% to 30.5% [2] - Function calling capabilities have been enhanced, with accuracy on ComplexFuncBench increasing from 49.7% to 66.5%, and the introduction of asynchronous function calling [2] Group 2: API Upgrades - The Realtime API has been fully upgraded to meet production-level requirements, allowing developers to connect remote MCP servers directly to conversations [3] - The API now supports image input, enabling applications to engage in dialogue based on visual content [3] - SIP support allows seamless integration of voice agents with existing telephone systems, including PBX and desktop phones [3] - Early enterprise partners like Zillow and T-Mobile are testing these features in near-production environments, emphasizing a shift from scripted automation to more flexible, domain-specific interactions [3] Group 3: Security and Accessibility - OpenAI has strengthened deployment security measures, incorporating classifiers to halt harmful conversations and allowing developers to add specific domain safety constraints through the Agents SDK [3] - The Realtime API's preset voices help reduce impersonation risks [3] - gpt-realtime and the Realtime API are now fully accessible to all developers, with documentation and a demo version available for quick onboarding [4]
Agora and OpenAI's Realtime API Power Seamless Interaction with Multimodal AI Agents
Prnewswire· 2025-09-04 20:01
Core Insights - Agora has enhanced its Conversational AI Engine by integrating OpenAI's Realtime API, enabling more natural communication and interaction [1][2][3] - The integration supports features like automated greetings, mixed-modality interaction, and selective attention locking, aimed at improving user experience with AI agents [1][7] Company Developments - Agora's partnership with OpenAI marks a significant milestone, as the Realtime API is the first multimodal large language model (MLLM) incorporated into Agora's platform [2] - The integration allows developers to create more responsive and human-like AI agents, reducing development complexity while enhancing real-time interaction capabilities [2][3] Technological Advancements - Agora's Conversational AI Engine now includes advanced features that facilitate natural interaction with AI agents, streamlining the adoption of the Realtime API [3] - The combination of OpenAI's real-time language model and Agora's global real-time network infrastructure (SDRTN®) accelerates time to market and simplifies application development [3] Industry Applications - Robotics startup Carbon Origins is utilizing Agora's technology with OpenAI's Realtime API for hands-free operation of heavy equipment, enhancing operator efficiency [4][5] - The integration of these technologies supports various applications, including customer support, education, gaming, and fan engagement [5] Key Features - Automated Greetings: Provides instant session awareness and a welcoming onboarding experience [7] - Mixed-Modality Interaction: Allows seamless switching between voice and text inputs during interactive sessions [7] - Selective Attention Locking: Filters out ambient noise for uninterrupted engagement [7]
中航证券:政策护航+应用提速 AI有望加速释放新质生产力
智通财经网· 2025-09-01 01:59
Core Insights - The artificial intelligence industry is experiencing a convergence of policy, technology, and application, leading to accelerated development and industrialization of large model capabilities [1][3] - The Chinese government has issued a strategic framework for AI development, emphasizing six areas of deep integration and setting clear mid- to long-term goals for 2027, 2030, and 2035 [1][3] Policy and Strategic Framework - The State Council has released the "Implementation of the 'Artificial Intelligence+' Action Plan," which outlines strategic directions for the industry [1] - The National Development and Reform Commission highlights the next one to two years as a critical window for AI application acceleration, advocating for a structured approach to development [1][3] - A standardized evaluation system for AI has been established, covering over 60 companies and facilitating more than 100 assessments, promoting a systematic approach to industry evaluation [1] Technological Advancements - Significant upgrades in large model technologies are being observed, with improvements in capabilities for complex task handling and visual reasoning tasks [2] - New AI models such as Claude Opus4.1 and GLM-4.5V have demonstrated enhanced performance in various applications, including coding and visual understanding [2] Investment Recommendations - Companies to focus on include Kunlun Wanwei and iFlytek, which are key players in large model development and AI agent capabilities [3] - Other recommended companies for investment in AI application scenarios include Focus Technology, Aofei Entertainment, and several others involved in diverse sectors such as education and entertainment [3]
OpenAI发布端对端语音模型GPT-Realtime,助力开发者构建语音智能体
3 6 Ke· 2025-08-30 16:34
Core Insights - OpenAI has launched its most advanced end-to-end speech model, GPT-Realtime, which aims to provide developers with a more efficient and cost-effective way to build voice agents [1][3][11] - The pricing for GPT-Realtime has been significantly optimized, reducing costs by 20% compared to the previous model, GPT-4o-Realtime-Preview [1][11] - The new model demonstrates substantial improvements in performance, including better audio quality, expressiveness, and the ability to follow complex instructions [3][5][7][10] Pricing and Cost Efficiency - GPT-Realtime's pricing is set at $32 per million audio input tokens and $64 per million audio output tokens, compared to the previous model's $40 and $80 respectively [1] - The new pricing structure allows developers to create efficient voice agents at a lower cost while enjoying superior performance [1] Model Performance Enhancements - GPT-Realtime shows a significant leap in performance metrics, achieving an accuracy of 82.8% in the Big Bench Audio reasoning test, up from 65.6% for the previous model [5] - The model's instruction-following accuracy reached 30.5% in the MultiChallenge Audio test, surpassing the previous model's performance [7] - In the ComplexFuncBench Audio test, GPT-Realtime achieved a function call accuracy of 66.5%, indicating improved capabilities in using external tools [10] Developer Empowerment and API Upgrades - The Realtime API has reached production-level standards, allowing for direct audio processing and reducing latency [11] - New features include support for remote model context protocol (MCP) servers, enabling easier integration with external data sources [12] - The API now supports image input, allowing for multimodal conversations and expanding use cases for voice agents [12] Competitive Landscape - The release of GPT-Realtime occurs amid intense competition in the voice AI market, with companies like Anthropic and Meta making significant advancements [13][14] - OpenAI's enhancements aim to provide a more user-friendly and cost-effective solution, positioning the company favorably in the competitive landscape [14]
OpenAI杀入语音模型大战,祭出最强GPT-RealTime,加量还降价
3 6 Ke· 2025-08-29 06:08
Core Insights - OpenAI has launched its most advanced voice synthesis model, GPT-RealTime, which improves on complex instruction adherence, tool invocation, and natural speech generation [1][8] - The new model features enhanced capabilities such as capturing non-verbal cues like laughter and seamless language switching, making it more expressive and human-like [1][8] Pricing and Accessibility - The GPT-RealTime model is available to developers at a cost of $32 per million tokens for audio input, $0.4 per million tokens for cached input, and $64 per million tokens for audio output, representing a 20% price reduction compared to the previous model [1] Developer Features - OpenAI has introduced fine-grained control over conversation context, allowing developers to set smart token limits and truncate multiple turns, significantly reducing costs for long conversations [2] - The Realtime API now supports remote MCP server integration, image input, and SIP for direct connections to public phone networks, enhancing the model's usability [16][17] Model Performance - GPT-RealTime has shown improved accuracy in understanding user instructions, achieving an 82.8% accuracy rate in the Big Bench Audio evaluation, surpassing previous models [8][10] - The model's function calling performance has also improved, scoring 66.5% in the ComplexFuncBench audio evaluation, indicating better execution of commands [13] Application Scenarios - OpenAI's new model has been integrated into various applications, including real estate assistance with Zillow, mobile assistance with T-Mobile, ticket purchasing with StubHub, and appointment scheduling with Oscar Health [4][5][6][7] - The model's ability to maintain natural conversation flow and handle complex requests positions it well for diverse AI agent applications across industries [18]