Core Viewpoint - The article highlights the risk of marginalization faced by low-resource language countries in the AI era due to insufficient support from general large models for these languages [1][2]. Group 1: Language Models and Their Importance - Serbian language usage is significantly lower than Slovenian, with its token representation in general models being less than 0.1% [2]. - The need for language models to reflect cultural identity is emphasized, particularly in critical fields like healthcare and law [2]. - Hungarian language presents unique challenges for language models due to its complex morphology, underscoring the importance of data quality over quantity [2]. Group 2: Initiatives and Collaborations - The Israel Association for Human Language Technology (IAHLT) has developed a bilingual model (Hebrew + English) based on open-source models, addressing the need for technology equity for small languages [4]. - iFLYTEK's latest model, X1, supports over 130 languages and aims to collaborate globally to create comprehensive multilingual models and applications [4].
小语种恐被AI时代边缘化?多国专家呼吁:语言模型不能只服务大语种!
Di Yi Cai Jing·2025-07-29 02:35