语音交互
Search documents
设计师朱梦也以“以人为本”的AI交互设计获多项国际奖项
Nan Fang Du Shi Bao· 2026-01-07 05:35
2025年被誉为设计创新与人本科技深度融合的一年。在这一年里,交互设计师朱梦也(Mengye Zhu) 凭借"以人为本"的人工智能交互设计理念脱颖而出,斩获德国 iF 设计奖、欧洲产品设计奖等多项国际 大奖。她的代表作"Quackiverse"将生成式AI与语音交互应用于儿童语言学习,打造个性化且富有情感温 度的学习体验,同时在健康科技与创意教育等领域探索AI交互的新可能。 朱梦也硕士毕业于康奈尔大学设计专业,立志通过设计提升社会的包容性与公平性。她的设计实践跨越 UI/UX、交互艺术和产品创新多个领域,但始终围绕一个核心——让尖端技术服务于人的真实需求。正 如她所强调的,优秀设计需要将创造力、技术与共情心融合在一起。 在她主导设计的代表作"Quackiverse"中,这一理念被完整地呈现。该平台以生成式AI与语音识别技术为 核心,为6至15岁儿童打造了一个沉浸式语言学习世界,让学习不再枯燥,而是一场充满探索与互动的 旅程。"Quackiverse"针对传统语言教育中"缺乏趣味性""难以坚持""家长陪伴不足"等痛点,构建了一个 AI驱动的动态学习系统。通过智能语音反馈、故事式任务与游戏化闯关机制,孩子可以在互 ...
豆神教育:公司的学伴机器人深度融合了火山引擎RTC技术与豆包大模型,但对公司经营基本面无重大影响
Mei Ri Jing Ji Xin Wen· 2026-01-05 08:14
豆神教育(300010.SZ)1月5日在投资者互动平台表示,您好,公司的学伴机器人深度融合了火山引擎 RTC技术与豆包大模型,旨在实现实时对话与精准引导,搭建自然、流畅、富有智慧的语音交互场景。 以上技术模型的接入可以帮助提升学伴机器人的语音交互效果、提升用户体验,但对公司经营基本面无 重大影响。感谢您的关注。 (文章来源:每日经济新闻) 每经AI快讯,有投资者在投资者互动平台提问:请问贵司的产品使用火山引擎技术主要包含哪些方 面? ...
报道:OpenAI整合团队拟一季度发布新语音模型,为发布AI个人无屏设备铺路
Hua Er Jie Jian Wen· 2026-01-01 22:27
OpenAI正优化其音频人工智能模型,为计划中的语音驱动型个人设备做准备。 据报道,新语音模型将具备更自然的情感表达能力和实时对话功能,包括处理对话打断的能力,这是现 有模型无法实现的关键特性,计划2026年第一季度发布。 报道援引知情人士称,OpenAI还计划推出一系列无屏设备,包括智能眼镜和智能音箱,将设备定位为 用户的"协作伴侣"而非单纯的应用入口。 不过在推出支持语音指令的消费级AI硬件产品前,OpenAI需要先改变用户的使用习惯。 1月1日,据The Information报道,OpenAI过去两个月内整合工程、产品和研究力量,集中攻克音频交 互的技术瓶颈,目标打造一款可通过自然语音指令操作的消费级设备。 团队整合聚焦无屏交互方式 公司内部研究人员认为,当前ChatGPT的语音模型在准确性和响应速度上均落后于文本模型,且两者使 用的底层架构并不相同。 据报道,OpenAI当前的语音模型与文本模型分属不同架构,导致用户通过语音与ChatGPT对话时,获得 的回答质量和速度均逊于文本模型。 为解决这一问题,OpenAI在过去两个月内完成了关键团队整合。 在组织层面,今夏从Character.AI加入的语 ...
OpenAI整合团队开发音频AI模型 为发布AI个人设备铺路
Xin Lang Cai Jing· 2026-01-01 15:32
目前,当用户与ChatGPT对话时,虽然聊天机器人能进行语音回复,但其语音版本与文本版本使用的底 层模型并不相同。一位前员工和一位现职员工透露,OpenAI内部研究人员认为,当前语音模型在回答 准确性和响应速度上均落后于文本模型。 为应对这一挑战,过去两个月内,OpenAI已整合了工程、产品和研究团队,共同推进音频模型的优 化。提升语音模型准确性对OpenAI至关重要,因其计划推出一款支持语音指令的消费级设备。据此前 报道,首款设备预计至少一年后面世。 知情人士称,新音频模型架构能生成更自然、更具情感且更精准深入的回应,同时支持与用户实时对话 (现有模型无法实现)并更好地处理对话打断。该模型目标发布时间为2026年第一季度,OpenAI发言 人对此不予置评。 据知情人士透露,OpenAI正采取措施优化其音频AI模型,为未来发布由AI驱动的个人设备做准备。三 名知情人士表示,该设备预计将主要依赖音频交互。 责任编辑:王许宁 据知情人士透露,OpenAI正采取措施优化其音频AI模型,为未来发布由AI驱动的个人设备做准备。三 名知情人士表示,该设备预计将主要依赖音频交互。 与谷歌、亚马逊、Meta和苹果类似,Ope ...
通义端到端语音交互模型Fun-Audio-Chat发布
Feng Huang Wang· 2025-12-23 11:50
在模型训练与架构设计上,阿里云披露了两项关键技术路径。其一是Core-Cocktail 两阶段训练策略,通 过分阶段引入语音与多模态能力,再与原有文本大模型参数融合微调,以降低新增能力对原有语言理解 能力的影响,缓解"灾难性遗忘"问题。其二是引入多阶段、多任务的偏好对齐训练,使模型在真实语音 对话中能更准确捕捉语义与情绪线索,提升对话自然度。 算力效率也是该模型的一大特点。Fun-Audio-Chat-8B采用 压缩—自回归—解压缩的双分辨率端到端架 构,将音频帧率降低至约 5Hz。在保证语音质量的前提下,该设计可 节省近50%的GPU计算开销,在当 前语音大模型普遍算力成本较高的背景下,具有一定工程意义。 整体来看,Fun-Audio-Chat-8B的开源,标志着通义大模型在语音交互方向进一步向"低算力、强对话"的 实用化阶段推进,也为开源语音大模型在真实场景中的部署提供了新的技术参考。 凤凰网科技讯 12月23日 通义大模型发布新一代端到端语音交互模型Fun-Audio-Chat。这是通义百聆语 音模型系列中,首个主打"语音对语音"交互能力的模型,支持用户直接通过语音与模型进行多轮对话。 从技术指标看,该 ...
完爆ChatGPT,谷歌这招太狠:连你的「阴阳怪气」都能神还原
3 6 Ke· 2025-12-15 02:04
Core Insights - Google has launched the Gemini 2.5 Flash Native Audio model, which enables real-time voice translation while preserving tone and delivering a more natural conversational experience, marking a significant advancement in AI interaction [1][3][10]. Group 1: Technological Advancements - The new model allows for direct audio processing without converting speech to text, enhancing the speed and emotional nuance of interactions [6][8]. - Gemini 2.5 Flash supports real-time speech translation, allowing for continuous listening and automatic language switching during conversations, effectively acting as an invisible translator [11][19]. - The model captures emotional nuances in speech, translating not just words but also the speaker's tone and attitude, which is crucial in contexts like business negotiations [12][14][15]. Group 2: Developer and Business Implications - The update improves the accuracy of function calls and command adherence, increasing the compliance rate from 84% to 90%, which is vital for enterprise-level applications [18][23]. - Gemini 2.5 enhances multi-turn dialogue capabilities, allowing for more coherent and logical conversations, making AI interactions feel more human-like [24]. - The introduction of Gemini API in 2026 will expand these capabilities to more products, lowering the barrier for businesses to create advanced AI customer service solutions [28][29]. Group 3: Future Outlook - The advancements signal a shift towards voice interaction as a primary interface for technology, moving AI beyond screens and into everyday life [25][27]. - The potential for users to communicate across language barriers with ease suggests a transformative impact on global communication [28].
喝点VC|a16z专访百亿美金AI语音独角兽11Labs CEO :首要之务是深入行业内部,花时间理解他们的核心诉求与激励机制
Z Potentials· 2025-12-13 11:09
Z Highlights Mati Stanizewski , ElevenLabs 首席执行官兼联合创始人。本次方案为 a16z 合伙人 Jennifer Li 与 Mati 在 2025 年 11 月 4 日进行的讨论,深入探讨了该团 队如何以闪电般的速度运送研究级人工智能 —— 从文本到语音和完全授权的人工智能音乐到实时语音代理,以及为什么语音是下一个人机交互界面。 速度与深度的博弈:我们如何用 " 小团队 " 撬动 " 大研究 " Jennifer Li : 我很荣幸欢迎我们的首位演讲者 ——ElevenLabs 联合创始人兼首席执行官 Mati 。 Mati ,很高兴你能来到这里。 Mati Staniszewski : 非常感谢邀请我来这里。很高兴见到大家,早上好。 图片来源: Youtube Jennifer Li : 刚才的欢迎音 乐是 ElevenLabs 生成的吧? Mati Staniszewski : 确实如此。我们在音频领域持续拓展。最初从语音技术起步,随后构建了语音助手的编排体系,如今更开发出完全持牌的音乐模型, 能够创作精彩绝伦的音乐作品与之相辅相成。 Jennifer Li : ...
瑞声科技助力夸克AI眼镜S1 开创语音交互新范式
Zhong Guo Jing Ji Wang· 2025-12-03 04:53
Core Insights - The Quark AI Glasses S1, developed by Alibaba, features a dual-display AI assistant experience, addressing the challenges of natural, private, and reliable interaction in various environments [2][5] - The collaboration between Quark and AAC Technologies has led to the creation of an innovative high-precision audio pickup system, combining a 5-microphone array with a bone conduction microphone, significantly reducing false wake-up rates [2][4] Group 1: Product Features - The Quark AI Glasses S1 utilizes a 5-microphone array and a bone conduction microphone, allowing users to issue commands even in low-volume whispers, enhancing user interaction [2][4] - The audio pickup system has a signal-to-noise ratio (SNR) of 77dB, optimized for capturing voice frequencies while minimizing environmental noise interference [4] - The design of the microphone system has reduced the size by 25% compared to industry standards, allowing for a more compact and lightweight design [4] Group 2: User Experience - The advanced audio recognition technology enables accurate command reception in noisy environments, such as subways, and allows for discreet communication in quiet places like libraries [4][5] - The Quark AI Glasses S1 provides a seamless user experience across various scenarios, including outdoor activities and real-time translation in multilingual settings [4][5] - The integration of these technologies transforms smart glasses from niche products into essential daily companions that respect user privacy and enhance interaction [5]
可识别唇语,苹果的新专利可能会解救头戴设备
3 6 Ke· 2025-12-01 02:18
Core Viewpoint - Apple is developing a new patent for its Vision Pro headset that will enable lip-reading capabilities, allowing users to issue commands without speaking, which could significantly enhance user interaction with wearable devices [1][3]. Group 1: Patent and Technology Development - The patent titled "Electronic Device with Voice Input Structure" describes a scenario where the device can read lip movements through built-in visual sensors when the wearer is unable to speak [3][12]. - If successfully implemented, this technology could have a substantial positive impact on all current head-mounted devices [3]. Group 2: Market Context and Challenges - The AI glasses market is experiencing a surge, with companies like Xiaomi and Alibaba entering the space, but these products face significant challenges, including a high return rate of 40%-50% on platforms like Douyin [3][8]. - Users have expressed concerns about the practicality of voice interaction in public settings, where speaking aloud can lead to discomfort and privacy issues [6][8]. Group 3: Interaction Methods and User Experience - Traditional interaction methods for smart glasses, such as touch controls on the frame, are ergonomically challenging, leading to user fatigue [10]. - Voice interaction, while advanced in semantic recognition, still requires users to speak out loud, which can be socially uncomfortable in public environments [6][10]. Group 4: Future Implications - The ability to use lip-reading for command input could alleviate the social pressures associated with voice interaction, potentially transforming AI glasses and XR headsets from niche products to mainstream consumer electronics [10][14]. - By training AI models to recognize lip movements across different languages, the barriers to widespread adoption of these devices in public settings could be significantly reduced [12][14].
出门问问(02438)下跌16.67%,报0.7元/股
Jin Rong Jie· 2025-08-22 07:26
Group 1 - The core viewpoint of the article highlights the significant drop in the stock price of Outermost Inquiry (02438) by 16.67% on August 22, with a trading price of 0.7 HKD per share and a transaction volume of 27.5584 million HKD [1] - Outermost Inquiry Limited focuses on generative AI and voice interaction, serving content creators, enterprises, and consumers, with main business areas including AIGC products, AI government and enterprise services, and AIoT smart hardware [1] - The company is set to be listed on the Hong Kong Stock Exchange in April 2024, aiming to become a global leader in AICoPilot with advanced large model technology capabilities and product matrix [1] Group 2 - As of the mid-year report for 2025, Outermost Inquiry reported total revenue of 179 million RMB and a net profit of -2.898 million RMB [2] - The mid-year report for the fiscal year 2025 indicates a net loss attributable to shareholders of -2.898 million RMB, representing a year-on-year increase of 99.5%, with basic earnings per share at 0 RMB [3]