Kaggle Game Arena

Search documents
AI跑分越来越没意义,谷歌说不如让AI一起玩游戏
3 6 Ke· 2025-08-11 23:25
据谷歌方面介绍,此次比赛旨在通过策略游戏中的正面交锋,评估并推动AI模型在复杂推理和决策能 力上的进步,从而解决现有基准测试难以跟上模型发展速度的问题。同时他们此次赛事也是为了宣传自 己的Kaggle Game Arena平台,而后者则是谷歌推出的一个全新的、公开的基准测试平台。 与目前常规的AI基准测试不同,Kaggle Game Arena的测试题目是"策略游戏"。谷歌之所以推出一个让 AI玩游戏的平台,是因为当下传统的AI基准测试已经陷入瓶颈,难以反映旗舰模型的真实能力。简单 来说,或为名、或为利的AI厂商,已经将各种AI基准测试给玩坏了,所以作为业界巨头,谷歌选择站 出来正本清源。 其实在这一轮AI浪潮中,"钱不值钱了"是一个很特别的现象。以往独角兽通常指的是成立时间较短,估 值超过10亿美元、且未上市的科技创新企业。可现在只要创始人有一定的技术背景,一家AI初创企业 拿到10亿美元的估值几乎像吃饭喝水一样简单。 时隔八年,在生成式人工智能问世之后,谷歌又搞了一次"AI棋王争霸赛",OpenAI o4-mini、DeepSeek- R1、谷歌Gemini 2.5 Pro、Anthropic Claud ...
谷歌约战,DeepSeek、Kimi都要上,首届大模型对抗赛明天开战
机器之心· 2025-08-05 04:09
Core Viewpoint - The upcoming AI chess competition aims to showcase the performance of various advanced AI models in a competitive setting, utilizing a new benchmark testing platform called Kaggle Game Arena [2][12]. Group 1: Competition Overview - The AI chess competition will take place from August 5 to 7, featuring eight cutting-edge AI models [2][3]. - The participating models include notable names such as OpenAI's o4-mini, Google's Gemini 2.5 Pro, and Anthropic's Claude Opus 4 [7]. - The event is organized by Google and aims to provide a transparent and rigorous testing environment for AI models [6][8]. Group 2: Competition Format - The competition will follow a single-elimination format, with each match consisting of four games. The first model to score two points advances [14]. - If a match ends in a tie (2-2), a tiebreaker game will be played, where the white side must win to progress [14]. - Models are restricted from using external tools like Stockfish and must generate legal moves independently [17]. Group 3: Evaluation and Transparency - The competition will ensure transparency by open-sourcing the game execution framework and environment [8]. - The performance of each model will be displayed on the Kaggle Benchmarks leaderboard, allowing real-time tracking of results [12][13]. - The event is designed to address the limitations of current AI benchmark tests, which struggle to keep pace with the rapid development of modern models [12].