Workflow
有助于出海 | 理想新翻译框架既提高翻译质量又降低响应延迟
LI AUTOLI AUTO(US:LI) 理想TOP2·2025-09-13 11:50

Core Viewpoint - The article discusses the launch of StreamUni, a new streaming voice translation framework developed by Li Auto, which aims to address key challenges in long-duration streaming voice translation and has achieved superior performance in various benchmarks [3][4]. Group 1: Research Background - Simultaneous interpreting is a highly challenging task that requires real-time translation without interrupting the flow of communication. The goal of AI streaming voice translation is to enable machines to translate continuously and in real-time, similar to human interpreters [6]. - The development of AI streaming voice translation faces three core challenges: managing continuous input, maintaining low latency, and ensuring high-quality output [6][9]. Group 2: StreamUni Architecture - StreamUni's architecture is based on a single large voice model that integrates three key tasks: speech segmentation, strategy decision-making, and translation generation, simplifying the system structure and improving overall performance [10]. - The framework employs a "Speech CoT" mechanism, allowing the model to generate intermediate results at each stage of processing, thus enabling real-time decision-making and output [11][12]. Group 3: Innovative Highlights - StreamUni's unified model allows for end-to-end streaming voice translation, eliminating the need for external modules and significantly enhancing system performance [10]. - The model incorporates intelligent generation and truncation strategies, enabling it to make real-time decisions about when to output translations and when to wait for more context [11][12]. Group 4: Experimental Results - StreamUni demonstrated significant advantages in both sentence-level and streaming experiments, consistently achieving high translation quality (measured by BLEU scores) across various latency levels [21][23]. - In comparative tests, StreamUni achieved an average improvement of 2 BLEU points while reducing response latency by 500 milliseconds compared to previous methods, showcasing its efficiency and effectiveness [23]. Group 5: Collaboration and Open Source - Li Auto's collaboration in the development of StreamUni highlights the potential of deep integration between academia and industry, paving the way for innovative applications in cross-cultural communication [24]. - To foster further research and development, StreamUni's paper, code, and datasets have been made available on GitHub and Hugging Face [25][26].