AI圈顶级榜单曝黑幕，Meta作弊刷分实锤？

Core Viewpoint - The article discusses allegations of manipulation in the LMArena ranking system for AI models, suggesting that major companies are gaming the system to inflate their scores and undermine competition [2][11][19]. Group 1: Allegations of Cheating - Researchers from various institutions have published a paper accusing AI companies of exploiting LMArena to boost their rankings by selectively testing models and withdrawing low-scoring ones [11][12][15]. - The paper analyzed 2.8 million battles across 238 models from 43 providers, revealing that a few companies implemented policies that led to overfitting specific metrics rather than genuine AI advancements [12][19]. - Meta reportedly tested 27 variants of its Llama 4 model privately before its public release, raising concerns about unfair advantages [19][20]. Group 2: Data Access Inequality - The study found that closed-source commercial models (like those from Google and OpenAI) participated more frequently in LMArena compared to open-source models, leading to a long-term data access inequality [23][30]. - Approximately 61.3% of all data in LMArena is directed towards specific model providers, with Google and OpenAI models accounting for about 19.2% and 20.4% of all user battle data, respectively [26][30]. - The limited access to data for open-source models could potentially lead to a relative performance improvement of up to 112% if they had access to more data [31][32]. Group 3: Official Response - LMArena quickly responded to the allegations, claiming that the research contained numerous factual inaccuracies and misleading statements [36][40]. - They emphasized that they have always aimed to treat all model providers fairly and that the number of tests submitted is at the discretion of the providers [40][41]. - LMArena's policies regarding model testing and ranking have been publicly available for over a year, countering claims of secrecy [40][41]. Group 4: Future of Rankings - Andrej Karpathy, a prominent figure in AI, expressed concerns that the focus on LMArena scores has led to models that excel in ranking rather than overall quality [42][43]. - He suggested OpenRouterAI as a potential new ranking platform that could be less susceptible to manipulation [44][49]. - The original intent of LMArena, created by students from various universities, has been overshadowed by corporate interests and the influx of major tech companies [51][56].