一盘狼人杀,扒下大模型底裤,GPT-5暴碾全场,开源被“团灭”?
智东西9月4日消息,近日,Foaster Labs给大模型组织了一场6人局屠城模式的狼人杀循环赛。 首轮循环赛集结了7款大语言模型:GPT-5、GPT-5-mini、Gemini 2.5 Pro、Gemini 2.5 flash、Qwen3-235B-Instruct、Kimi-K2-Instruct、GPT-OSS-120B。 | | Elo Leaderboard | | | | | | | --- | --- | --- | --- | --- | --- | --- | | RANK | MODEL | ELO | ELO-W | ELO-V | WIN RATE | MATCHES | | ੰ | GPT-5 OpenAl | 1492 | 1508 | 1476 | 96.7% | 60 | | 0 | Gemini 2.5 Pro G Google | 1261 | 1163 | 1360 | 63.3% | 60 | | 6 | Gemini 2.5 Flash G Google | 1188 | 1103 | 1273 | 51.7% | 60 | | #4 | Qwen3-235B-Ins ...