Unisound-云知声山海·知音2.0重磅发布重塑人机交互新范式

Core Insights - The company is accelerating its "One Foundation, Two Wings" technology strategy amid the rise of intelligent agents, recently launching the "ShanHai.ZhiYin" model 2.0 after upgrading the "ShanHai.ZhiYi" 5.0 medical model [1] Group 1: Model Capabilities - The "ShanHai.ZhiYin" model 2.0 focuses on three major capability evolutions: understanding professional and local dialects, expressing warmth and emotional connection, and achieving extreme responsiveness [1] - In terms of "understanding," the model's ASR (Automatic Speech Recognition) capabilities have demonstrated leading performance in both public test sets and proprietary full-scenario test sets, surpassing mainstream domestic open-source and closed-source speech models, reaching the highest industry standards [1] - For the "expression" aspect, the ShanHai.ZhiYin-TTS (Text-to-Speech) features a "highly human-like and creatively diverse" core, currently supporting 12 dialects (including Cantonese, Sichuanese, and Shanghainese) and 10 foreign languages, with the ability to switch between 12 styles of Mandarin [1] - The model also overcomes challenges in smooth full-duplex interaction, enabling real-time interruptions, immediate responses, and coherent follow-up questions, making human-machine dialogue as fluid as conversations between close friends [1] Group 2: Technological Foundation - The capabilities of the ShanHai.ZhiYin 2.0 model are underpinned by the company's proprietary "ShanHai.Atlas" intelligent computing foundation, which deeply integrates the general multimodal model base with the Atlas architecture, serving as the foundation for professional intelligent agents and the core of perceptual AI [1]