Core Viewpoint - The article highlights the launch of Doubao-Seed-ASR-2.0 by Huoshan Engine, which significantly enhances voice recognition capabilities through advanced contextual understanding and multi-modal visual recognition [1] Group 1: Model Enhancements - The 2.0 version of the model features improved inference capabilities, allowing for precise recognition through deep contextual understanding [1] - Overall keyword recall rate has increased by 20%, indicating a substantial improvement in recognition accuracy [1] Group 2: Multi-modal and Language Support - The model supports multi-modal visual recognition, enabling it to understand both audio and visual inputs, which enhances text recognition accuracy [1] - It recognizes 13 foreign languages, including Japanese, Korean, German, and French, broadening its applicability in global markets [1] Group 3: Specialized Recognition - The model has been upgraded to better handle complex scenarios involving proper nouns, personal names, geographical names, brand names, and easily confused homophones [1]
豆包发布语音识别模型2.0,支持多模态视觉识别和13种海外语种识别
Feng Huang Wang·2025-12-05 08:55