LMArena
Search documents
全球最大AI榜单塌房,52%高分答案全是胡扯,硅谷大厂集体造假?
3 6 Ke· 2026-01-08 09:54
谁能想到,AI界最权威的大模型排行榜,竟然是个彻头彻尾的骗局? 最近,2025年底的一篇名为《LMArena is a cancer on AI》的文章被翻了出来。 登上了Hacker News的首页,引起轩然大波! 炸裂的是,这篇文章直接把LMArena——这个被无数研究者奉为圭臬的评测平台——钉在了耻辱柱上,称其为AI发展的「癌症」。 从金标准到毒瘤 所以,LMArena究竟是何方神圣? 先说说背景。 LMArena(也叫LMSYS Chatbot Arena)是由加州大学伯克利分校、卡内基梅隆大学等顶尖学府的研究者于2023年创建的大模型评测平台。 | 图 Text | | | 1 8 days ago | WebDev | | | 1 9 days ago | | --- | --- | --- | --- | --- | --- | --- | --- | | Rank 14 | Model 11 | Score J | Votes 1 | Rank 11 | Model 1J | Score ↓ 0 | Votes 11 | | 1 | G gemini-3-pro | 1490 | 21,938 ...
给AI打个分,结果搞出17亿估值独角兽?
3 6 Ke· 2026-01-07 11:04
大模型竞技场LMArena官宣拿下1.5亿美元A轮融资。 估值升至17亿美元,妥妥的新年开门红! 这波融资由Felicis和加州大学投资公司UC Investments领投,Andreessen Horowitz、The House Fund等机构跟投。 资本用真金白银投票,足以见得AI时代大模型评估这个赛道有多香~ 而这支90后华人含量99%团队的走红之路,还得从2023年ChatGPT横空出世后说起。 从学术探索到商业崛起 LMArena的前身是曾经火爆AI圈的Chatbot Arena,最早由LMSYS这个自发的开源组织创建。 组织的核心成员全是来自UC伯克利、斯坦福、UCSD、CMU等顶尖高校的学霸。 他们的开源推理引擎SGLang在业内首次实现了在96块H100上跑出几乎媲美DeepSeek官方报告吞吐量的开源方案。 目前SGLang已经实现大规模部署,被xAI、英伟达、AMD、谷歌云、甲骨文云、阿里云、美团、腾讯云等企业和机构采用。 不过,比起硬核技术,他们最主要也更出圈的工作是对大模型进行评估。 在ChatGPT、Claude一众模型刚刚面世之际,他们率先创办了Chatbot Arena这么一个 ...
给AI打个分,结果搞出17亿估值独角兽???
量子位· 2026-01-07 09:11
闻乐 发自 凹非寺 量子位 | 公众号 QbitAI 大模型竞技场 LMArena 官宣拿下 1.5亿美元 A轮融资。 估值升至17亿美元,妥妥的新年开门红! 这波融资由Felicis和加州大学投资公司UC Investments领投,Andreessen Horowitz、The House Fund等机构跟投。 资本用真金白银投票,足以见得AI时代大模型评估这个赛道有多香~ 而这支90后华人含量99%团队的走红之路,还得从2023年ChatGPT横空出世后说起。 从学术探索到商业崛起 LMArena的前身是曾经火爆AI圈的 Chatbot Arena ,最早由 LMSYS 这个自发的开源组织创建。 组织的核心成员全是来自UC伯克利、斯坦福、UCSD、CMU等顶尖高校的学霸。 他们的开源推理引擎 SGLang 在业内首次实现了在96块H100上跑出几乎媲美DeepSeek官方报告吞吐量的开源方案。 目前SGLang已经实现大规模部署,被xAI、英伟达、AMD、谷歌云、甲骨文云、阿里云、美团、腾讯云等企业和机构采用。 不过,比起硬核技术,他们最主要也更出圈的工作是 对大模型进行评估 。 在ChatGPT、Cl ...
LMArena:谁是AI之王,凭什么这个评测说了算?
硅谷101· 2025-10-30 22:35
AI Model Evaluation Landscape - Traditional benchmark tests are losing credibility due to "data leakage" and "score manipulation" [1] - LMArena platform uses "anonymous battles + human voting" to redefine the evaluation criteria for large models [1] - Top models from GPT to Claude, Gemini to DeepSeek are competing on LMArena [1] LMArena's Challenges - LMArena faces challenges to its fairness due to Meta's "ranking manipulation" incident, data asymmetry issues, and platform commercialization [1] - "Human judgment" in LMArena may contain biases and loopholes [1] Future of AI Evaluation - The industry is moving towards "real combat" Alpha Arena and a combination of "static and dynamic" evaluations [1] - The ultimate question is not "who is stronger," but "what is intelligence" [1]
「纳米香蕉」LMArena两周500万投票,引爆10倍流量,谷歌、OpenAI扎堆打擂台
3 6 Ke· 2025-09-04 10:10
Core Insights - The article highlights the rapid rise of the AI image editor "nano-banana," which topped the LMArena Image Edit Arena, leading to a tenfold increase in platform traffic and over 3 million monthly active users [1][9][12] - Since its launch in 2023, LMArena has become a competitive arena for major AI companies like Google and OpenAI, allowing users to vote and provide feedback on various AI models [1][9][12] Group 1: Performance Metrics - "Nano-banana" attracted over 5 million total votes within two weeks of its blind testing, achieving more than 2.5 million direct votes, marking the highest engagement in LMArena's history [3][9] - LMArena's CTO confirmed that the platform's monthly active users have surpassed 3 million due to the surge in traffic driven by "nano-banana" [9][12] Group 2: Community Engagement - LMArena operates as a user-centric evaluation platform, allowing community members to assess AI models through anonymous and crowdsourced pairwise comparisons, which enhances the evaluation process [12][16] - The platform encourages user participation, with a focus on real-world use cases, enabling AI model providers to receive actionable feedback for model improvement [20][29] Group 3: Competitive Landscape - Major AI companies, including Google and OpenAI, are keen to feature their models on LMArena to gain brand exposure and user feedback, which can significantly enhance their market presence [20][22] - The Elo scoring system used in LMArena helps to minimize biases and provides a more accurate reflection of user preferences regarding model performance [20][21] Group 4: Future Directions - LMArena aims to expand its benchmarking to include more real-world use cases, bridging the gap between technology and practical applications [26][28] - The platform's goal is to maintain transparency in data research processes and to publish findings that can aid in the continuous development of the community [29][30]
人物一致性新王Nano Banana登基,AI图片编辑史诗级升级。
数字生命卡兹克· 2025-08-19 01:05
Core Viewpoint - The article discusses the capabilities of a new AI image generation model called Nano Banana, which is believed to be developed by Google. It highlights the model's exceptional consistency in generating images that closely resemble the input reference, outperforming other existing models in the market [1][24][81]. Summary by Sections Introduction to Nano Banana - Nano Banana is described as a powerful AI drawing model that has shown impressive results in practical applications [1]. - The model is currently only available for blind testing on LMArena, a platform for evaluating AI models [9][11]. Performance Comparison - The author provides a case study comparing Nano Banana with other models like GPT-4o, Flux Kontext, and Seedream, showcasing Nano Banana's superior ability to maintain facial features and expressions [3][4][6]. - In various tests, Nano Banana consistently outperformed competitors in terms of subject consistency and background replacement capabilities [39][51][68]. User Experience - Users can access Nano Banana by logging into LMArena and participating in a battle mode where they select the better image from two randomly generated options [26][30]. - The article emphasizes the ease of use and the high-quality results achieved with minimal attempts [7][80]. Conclusion - The article concludes that Nano Banana is currently the leading model in terms of image consistency and quality, suggesting that it could revolutionize the way users create personalized images and videos [82]. - The author expresses admiration for Google's comprehensive advancements in AI technology [81].
AI圈顶级榜单曝黑幕,Meta作弊刷分实锤?
虎嗅APP· 2025-05-01 13:51
Core Viewpoint - The article discusses allegations of manipulation in the LMArena ranking system for AI models, suggesting that major companies are gaming the system to inflate their scores and undermine competition [2][11][19]. Group 1: Allegations of Cheating - Researchers from various institutions have published a paper accusing AI companies of exploiting LMArena to boost their rankings by selectively testing models and withdrawing low-scoring ones [11][12][15]. - The paper analyzed 2.8 million battles across 238 models from 43 providers, revealing that a few companies implemented policies that led to overfitting specific metrics rather than genuine AI advancements [12][19]. - Meta reportedly tested 27 variants of its Llama 4 model privately before its public release, raising concerns about unfair advantages [19][20]. Group 2: Data Access Inequality - The study found that closed-source commercial models (like those from Google and OpenAI) participated more frequently in LMArena compared to open-source models, leading to a long-term data access inequality [23][30]. - Approximately 61.3% of all data in LMArena is directed towards specific model providers, with Google and OpenAI models accounting for about 19.2% and 20.4% of all user battle data, respectively [26][30]. - The limited access to data for open-source models could potentially lead to a relative performance improvement of up to 112% if they had access to more data [31][32]. Group 3: Official Response - LMArena quickly responded to the allegations, claiming that the research contained numerous factual inaccuracies and misleading statements [36][40]. - They emphasized that they have always aimed to treat all model providers fairly and that the number of tests submitted is at the discretion of the providers [40][41]. - LMArena's policies regarding model testing and ranking have been publicly available for over a year, countering claims of secrecy [40][41]. Group 4: Future of Rankings - Andrej Karpathy, a prominent figure in AI, expressed concerns that the focus on LMArena scores has led to models that excel in ranking rather than overall quality [42][43]. - He suggested OpenRouterAI as a potential new ranking platform that could be less susceptible to manipulation [44][49]. - The original intent of LMArena, created by students from various universities, has been overshadowed by corporate interests and the influx of major tech companies [51][56].