Workflow
豆包语音识别模型2.0
icon
Search documents
火山引擎发布豆包语音识别模型2.0
智通财经网· 2025-12-05 08:24
Core Insights - The core viewpoint of the article is the launch of Doubao-Seed-ASR-2.0 by Huoshan Engine, which significantly enhances voice recognition capabilities through improved contextual understanding and multi-modal visual recognition [1] Group 1: Model Enhancements - The new model features a 20% improvement in overall keyword recall rate through enhanced contextual understanding [1] - It supports multi-modal visual recognition, allowing the model to not only "hear words" but also "see images," improving text recognition accuracy with single and multiple image inputs [1] - The model is capable of accurately recognizing 13 foreign languages, including Japanese, Korean, German, and French [1] Group 2: Technical Specifications - Doubao voice recognition model is built on the Seed mixed expert large language model architecture, maintaining the advantages of the 1.0 version's 2 billion parameter high-performance audio encoder [1] - The upgrade focuses on optimizing recognition in complex scenarios involving proper nouns, names, geographical locations, brand names, and easily confused homophones [1] - Enhanced contextual reasoning capabilities enable the model to achieve multi-modal information understanding and mixed-language recognition accuracy [1]