Workflow
阶跃星辰发布最强开源端到端语音大模型,开启终端人机交互语音新范式
IPO早知道·2025-09-01 04:06

Core Viewpoint - The article discusses the launch of Step-Audio 2 mini, an advanced open-source end-to-end voice model by Jieyue Xingchen, which achieves state-of-the-art performance in various international benchmark tests, enhancing human-machine interaction efficiency and intelligence [2][4]. Group 1: Model Performance - Step-Audio 2 mini has achieved superior performance in audio understanding, speech recognition, cross-lingual translation, and emotional and paralinguistic analysis, outperforming all other open-source end-to-end voice models, including Qwen-Omni and Kimi-Audio, as well as most tasks compared to GPT-4o-audio [3][4]. - The model's scores in various benchmarks include MMAU at 73.2, URO Bench at 61.3, and StepEval-Audio-Paralinguistic at 80.0, indicating its comprehensive capabilities [3]. Group 2: Technological Innovations - Step-Audio 2 mini introduces audio reasoning capabilities, allowing it to understand and respond to emotions, tones, and non-verbal signals, thereby enhancing its ability to grasp the nuances of human communication [4]. - The model supports native Tool Calling capabilities, enabling internet searches and addressing hallucination issues, thus equipping it with a robust knowledge base and reasoning abilities similar to text models [4]. Group 3: Industry Impact and Collaborations - The model has been integrated into the Geely Galaxy M9, marking the first mass production of an end-to-end voice model in the industry, showcasing its practical application [4]. - Jieyue Xingchen has established deep collaborations with leading terminal manufacturers such as Geely, Whale Robotics, TCL, and Cyan, aiming to provide smarter and more convenient interactive experiences for consumers [4].