Workflow
小语种恐被AI时代边缘化?多国专家呼吁:语言模型不能只服务大语种!
第一财经·2025-07-29 06:33

Core Viewpoint - The article discusses the challenges faced by low-resource languages in the AI era, emphasizing the need for equitable AI technology that supports all languages, particularly those with fewer speakers [2][3]. Group 1: AI Language Models and Low-Resource Languages - The emergence of translation machines has broken down language barriers, yet many low-resource languages remain at risk of being marginalized in the AI age due to insufficient support from general-purpose models [2]. - Experts argue that if a language is forgotten by technology, the community using that language will also be forgotten, highlighting the importance of enabling children in small language countries to interact with AI in their native languages [3]. Group 2: Specific Language Challenges - Serbian, a language spoken in Eastern Europe, has a token representation of less than 0.1% in general models, indicating a significant gap in AI support [3]. - Hungarian language presents unique challenges for language models due to its complex affix combinations and free word order, with an emphasis on data quality over quantity for building reliable models [4]. - Hebrew, despite being successfully revived as a modern spoken language, is still considered a low-resource language in natural language processing [5]. Group 3: Collaborative Efforts and Solutions - The Israeli Association for Human Language Technology has developed a bilingual model (Hebrew + English) to address the needs of low-resource languages, focusing on collaboration with industry to solve data acquisition and training cost issues [6]. - iFLYTEK's latest model, X1, supports over 130 languages and aims to collaborate globally to create comprehensive multilingual models and applications, presenting a "Chinese solution" to the global multilingual model challenges [6].