Workflow
星火X1多语种语言模型
icon
Search documents
“中国经验”构建多语言大模型,帮助小语种国家融入世界
Core Viewpoint - The international academic seminar on high-level multilingual models highlighted the urgent need to address the digital divide affecting low-resource languages, as global mainstream models inadequately support these languages, posing a risk of marginalization for small language countries [1][5]. Group 1: Expert Opinions - Professor Vlado Delić from Serbia emphasized that Serbian language representation in general models is less than 0.1% in token usage, significantly lower than Slovenian, advocating for national-level models that reflect cultural identity to avoid translation errors in critical fields like healthcare and law [3]. - Avner Algom, founder of the Israeli Human Language Technology Association, stated that language services should not be designed solely for major languages, advocating for technological equity for small languages [5]. - Nipat Jongsawat from Thailand highlighted that language sovereignty is a strategic necessity for nations, not merely a choice [5]. Group 2: Technological Developments - Liu Cong, director of iFlytek Research Institute, reported that their multilingual model, Spark X1, expanded from supporting 81 languages in October 2022 to over 130 languages by July 2023, aiming to provide a comprehensive multilingual model and applications [5]. - The Spark X1 model reportedly outperforms GPT-4.1 in key languages such as Arabic, German, French, Korean, and Japanese, while the speech synthesis model supports 55 languages with leading industry performance [5]. - Zhang Xiao, deputy general manager of iFlytek's intelligent computing division, identified challenges in the rapid iteration of computing power and low efficiency in existing resources, proposing a five-element solution combining computing power, algorithms, data, applications, and ecosystem [7]. Group 3: Data Quality and Collaboration - Gábor Prószéky, director of the Hungarian Linguistics Research Center, stressed that data quality is more critical than quantity for building reliable large language models, noting the unique challenges posed by the Hungarian language's complex morphology [7]. - His team has developed the PULI model family, collaborating with Chinese AI counterparts to create a complete closed loop from training to application [7].