Workflow
Gemini 2.5 Flash Native Audio
icon
Search documents
X @Elon Musk
Elon Musk· 2025-12-18 05:31
GrokArtificial Analysis (@ArtificialAnlys):xAI’s new Grok Voice Agent is the new leading Speech to Speech model, surpassing Gemini 2.5 Flash Native Audio and GPT Realtime in our Big Bench Audio benchmarkThe new model achieves a score of 92.3% on Big Bench Audio, just ahead of the previous leader, Google’s Gemini 2.5 https://t.co/OH6oXxwhCu ...
X @Tesla Owners Silicon Valley
BREAKING: xAI’s new Grok Voice Agent is the new leading Speech to Speech model, surpassing Gemini 2.5 Flash Native Audio and GPT Realtime in our Big Bench Audio benchmark https://t.co/AxRFV6yAJR ...
X @Tesla Owners Silicon Valley
BREAKING: xAI’s new Grok Voice Agent is the new leading Speech to Speech model, surpassing Gemini 2.5 Flash Native Audio and GPT Realtime in our Big Bench Audio benchmark https://t.co/Mfn25WnvgI ...
X @xAI
xAI· 2025-12-17 20:40
RT Artificial Analysis (@ArtificialAnlys)xAI’s new Grok Voice Agent is the new leading Speech to Speech model, surpassing Gemini 2.5 Flash Native Audio and GPT Realtime in our Big Bench Audio benchmarkThe new model achieves a score of 92.3% on Big Bench Audio, just ahead of the previous leader, Google’s Gemini 2.5 Flash Native Audio Thinking. This model is @xAI’s first public Speech to Speech API, bringing increased competition to the space. The model has tool calling support and xAI has said it’s ready to ...
完爆ChatGPT,谷歌这招太狠:连你的「阴阳怪气」都能神还原
3 6 Ke· 2025-12-15 02:04
Core Insights - Google has launched the Gemini 2.5 Flash Native Audio model, which enables real-time voice translation while preserving tone and delivering a more natural conversational experience, marking a significant advancement in AI interaction [1][3][10]. Group 1: Technological Advancements - The new model allows for direct audio processing without converting speech to text, enhancing the speed and emotional nuance of interactions [6][8]. - Gemini 2.5 Flash supports real-time speech translation, allowing for continuous listening and automatic language switching during conversations, effectively acting as an invisible translator [11][19]. - The model captures emotional nuances in speech, translating not just words but also the speaker's tone and attitude, which is crucial in contexts like business negotiations [12][14][15]. Group 2: Developer and Business Implications - The update improves the accuracy of function calls and command adherence, increasing the compliance rate from 84% to 90%, which is vital for enterprise-level applications [18][23]. - Gemini 2.5 enhances multi-turn dialogue capabilities, allowing for more coherent and logical conversations, making AI interactions feel more human-like [24]. - The introduction of Gemini API in 2026 will expand these capabilities to more products, lowering the barrier for businesses to create advanced AI customer service solutions [28][29]. Group 3: Future Outlook - The advancements signal a shift towards voice interaction as a primary interface for technology, moving AI beyond screens and into everyday life [25][27]. - The potential for users to communicate across language barriers with ease suggests a transformative impact on global communication [28].