大模型评估
Search documents
给AI打个分,结果搞出17亿估值独角兽?
3 6 Ke· 2026-01-07 11:04
大模型竞技场LMArena官宣拿下1.5亿美元A轮融资。 估值升至17亿美元,妥妥的新年开门红! 这波融资由Felicis和加州大学投资公司UC Investments领投,Andreessen Horowitz、The House Fund等机构跟投。 资本用真金白银投票,足以见得AI时代大模型评估这个赛道有多香~ 而这支90后华人含量99%团队的走红之路,还得从2023年ChatGPT横空出世后说起。 从学术探索到商业崛起 LMArena的前身是曾经火爆AI圈的Chatbot Arena,最早由LMSYS这个自发的开源组织创建。 组织的核心成员全是来自UC伯克利、斯坦福、UCSD、CMU等顶尖高校的学霸。 他们的开源推理引擎SGLang在业内首次实现了在96块H100上跑出几乎媲美DeepSeek官方报告吞吐量的开源方案。 目前SGLang已经实现大规模部署,被xAI、英伟达、AMD、谷歌云、甲骨文云、阿里云、美团、腾讯云等企业和机构采用。 不过,比起硬核技术,他们最主要也更出圈的工作是对大模型进行评估。 在ChatGPT、Claude一众模型刚刚面世之际,他们率先创办了Chatbot Arena这么一个 ...
给AI打个分,结果搞出17亿估值独角兽???
量子位· 2026-01-07 09:11
闻乐 发自 凹非寺 量子位 | 公众号 QbitAI 大模型竞技场 LMArena 官宣拿下 1.5亿美元 A轮融资。 估值升至17亿美元,妥妥的新年开门红! 这波融资由Felicis和加州大学投资公司UC Investments领投,Andreessen Horowitz、The House Fund等机构跟投。 资本用真金白银投票,足以见得AI时代大模型评估这个赛道有多香~ 而这支90后华人含量99%团队的走红之路,还得从2023年ChatGPT横空出世后说起。 从学术探索到商业崛起 LMArena的前身是曾经火爆AI圈的 Chatbot Arena ,最早由 LMSYS 这个自发的开源组织创建。 组织的核心成员全是来自UC伯克利、斯坦福、UCSD、CMU等顶尖高校的学霸。 他们的开源推理引擎 SGLang 在业内首次实现了在96块H100上跑出几乎媲美DeepSeek官方报告吞吐量的开源方案。 目前SGLang已经实现大规模部署,被xAI、英伟达、AMD、谷歌云、甲骨文云、阿里云、美团、腾讯云等企业和机构采用。 不过,比起硬核技术,他们最主要也更出圈的工作是 对大模型进行评估 。 在ChatGPT、Cl ...
大模型作为评估者的「偏好」困境:UDA实现无监督去偏对齐
机器之心· 2025-11-28 00:51
在 LLM 评估体系日益依赖 "大模型担任评估者"(LLM-as-a-Judge)的今天,一个隐秘且严重的问题正在扭曲大模型的评估生态:偏好偏差。 即使是性能强劲的 GPT-4o 和 DeepSeek-V3,在进行成对答案比较时,也会系统性地偏爱特定输出 —— 尤其是自己生成的内容。这种偏差导致不同裁判模型给出 的评分和排名天差地别。论文中的实验数据显示,在 ArenaHard 数据集上,自我偏好偏差幅度从 - 38% 到 + 90% 不等。当模型既是 "运动员" 又是 "裁判" 时,公平 性无从谈起。 现有解决方案依赖提示工程、模型集成或博弈论重排等,但这些方法要么缺乏理论支撑,要么成本爆炸,要么难以扩展。更重要的是,它们都依赖人工设计的规 则,没有办法让大模型输出统一的结果。 UDA 的出现,为破解这一困局提供了新思路。来自智谱 AI 的研究团队将无监督学习引入成对 LLM 评判体系,让模型能够自主动态调整评分规则,实现去偏对 齐。 该论文已被 AAAI 2026 录用。 论文标题:UDA: Unsupervised Debiasing Alignment for Pair-wise LLM-as-a-J ...
3位00后,估值700亿
3 6 Ke· 2025-10-28 12:09
Core Insights - Mercor, an AI recruitment startup, has raised $250 million in new funding, achieving a valuation of $10 billion, which is five times its previous valuation of $2 billion earlier this year [1][3] - Founded in 2023 by three college dropouts, Mercor has developed a large professional talent network and has seen its annual recurring revenue grow from $1 to $500 million in just 17 months [1][3] Company Overview - Mercor specializes in AI-driven recruitment, utilizing AI to screen resumes and match candidates to job positions quickly [3][5] - The company has expanded its services to include data annotation and large model evaluation, leveraging its extensive network of 30,000 experts [3][9] - The startup's revenue has quadrupled since the turmoil at Scale AI, a competitor, leading to an influx of Scale's former employees and clients [13][14] Business Model and Revenue - Mercor's annual recurring revenue reached $70 million by February, driven by its new business in large model evaluation [3][9] - The company manages a network of experts who can earn significant daily wages, with total earnings exceeding $1.5 million daily [9][10] - The new funding will be allocated to expanding the talent network, enhancing the matching system, and improving delivery speed [3][4] Competitive Landscape - Mercor's main competitor, Scale AI, faced challenges after being acquired by Meta, which led to concerns about data neutrality and client trust [13][14] - The controversy surrounding Scale AI has inadvertently benefited Mercor, resulting in a significant increase in its revenue and client base [14][15] Future Prospects - Mercor's AI-driven recruitment model has positioned it as a key player in the large model evaluation space, filling a critical gap in the industry [15][16] - The company aims to continue leveraging its talent network to support the growing demand for high-quality data and expert feedback in AI model development [16]