Core Viewpoint - The article discusses the launch of TransBench, the first application-based AI translation evaluation ranking system, aimed at standardizing translation quality across various AI models [1][5][32]. Group 1: TransBench Overview - TransBench is a collaborative effort by Alibaba International AI Business, Shanghai Artificial Intelligence Laboratory, and Beijing Language University [2]. - It introduces new evaluation metrics such as hallucination rate, cultural taboo words, and politeness norms, addressing common issues in large model translations [3][34]. - The evaluation system is open-source and has released its first set of results, inviting AI translation institutions to participate [5][6][44]. Group 2: Evaluation Metrics - The evaluation framework categorizes data sets into three main types: general standards, e-commerce culture, and cultural characteristics [8][35]. - The ranking assesses translation capabilities based on four dimensions: overall score, general standards, e-commerce culture, and cultural characteristics [9][11]. Group 3: Model Performance - In the English-to-other-languages category, the top three models based on overall score and general standards are GPT-4o, DeepL Translate, and GPT-4-Turbo [16][14]. - For the e-commerce sector, DeepSeek-R1 ranks among the top performers, with Qwen2.5 models excelling in cultural characteristics [17][19]. - In the Chinese-to-other-languages category, DeepSeek-V3 leads, followed by Gemini-2.5-Pro and Claude-3.5-Sonnet [23][25]. Group 4: Industry Context - The demand for high-quality AI translation models has increased, necessitating adherence to cultural nuances and industry-specific language features [28][29]. - Traditional evaluation metrics are deemed insufficient for today's requirements, prompting the development of TransBench [31][32]. - Alibaba's Marco MT model has achieved significant usage, with an average daily call volume of 600 million, highlighting the importance of translation in global e-commerce [40][41].
首个AI翻译实战榜单出炉!GPT-4o稳坐天花板,文化方面Qwen系列一马当先丨开源