豆包发布语音识别模型2.0 支持多模态视觉识别和13种海外语种识别

Core Viewpoint - The article reports the official launch of Doubao-Seed-ASR-2.0, a voice recognition model by Huoshan Engine, which enhances contextual understanding and recognition accuracy through advanced technology [1] Group 1: Model Features - The 2.0 version of the model has improved inference capabilities, achieving a 20% increase in overall keyword recall rate [1] - It supports multimodal visual recognition, allowing the model to understand both audio and visual inputs, thereby enhancing text recognition accuracy [1] - The model can recognize 13 foreign languages, including Japanese, Korean, German, and French [1] Group 2: Targeted Upgrades - The model has been specifically upgraded to handle complex scenarios involving proper nouns, personal names, geographical names, brand names, and easily confused homophones [1]