Workflow
Kimi K2 Instruct
icon
Search documents
战报:马斯克Grok4笑傲AI象棋大赛,DeepSeek没干过o4-mini,Kimi K2被喊冤
量子位· 2025-08-06 08:14
不圆 奕然 发自 凹非寺 量子位 | 公众号 QbitAI 最新战报最新战报:首届AI国际象棋对战……马斯克家的Grok 4"遥遥领先"了。 是的,谷歌给大模型整了个国际象棋比赛:Kaggle AI象棋竞赛。 在首日对决之后,参赛选手中OpenAI的o3和o4-mini、DeepSeek R1、Kimi K2 Instruct、Gemini 2.5 Pro和2.5 Flash、Claude Opus 4、Grok 4都有了第一轮较量,结果—— Grok 4表现最佳,DeepSeek R1表现强劲,但不敌o4-mini,Kimi K2最惨——都让网友喊冤了。 眼见自家Grok 4表现出色,马斯克当然不会错过PR良机,不过回应略显凡尔赛: 我们没有刻意去训练,这只是一个副作用。 u1s1谁又能为这么个"无厘头"比赛专门刻意训练呢? 当然,让AI对战国际象棋,过程比输赢重要多了,毕竟谷歌发起这次比赛的初衷,就是测试"涌现"能力。 首届Kaggle AI国际象棋竞赛 本次比赛由谷歌发布,作为推广Kaggle游戏竞技场的一个环节。首次比赛以国际象棋开始。 参赛"选手"包括OpenAI的o3和o4-mini、DeepSe ...
闹玩呢,首届大模型对抗赛,DeepSeek、Kimi第一轮被淘汰了
3 6 Ke· 2025-08-06 08:01
Group 1 - The core focus of the article is the first international chess competition for large models, where Grok 4 is highlighted as a leading contender for the championship [1][24]. - The competition features various AI models, including Gemini 2.5 Pro, o4-mini, Grok 4, and others, all of which advanced to the semifinals with a 4-0 victory in their initial matches [1][9]. - The event is hosted on the Kaggle Game Arena platform, aiming to evaluate the performance of large language models (LLMs) in dynamic and competitive environments [1]. Group 2 - Kimi k2 faced o3 and lost 0-4, with Kimi k2 struggling to find legal moves after the opening phase, indicating potential technical issues [3][6]. - DeepSeek R1 lost to o4-mini with a score of 0-4, showcasing a pattern of initial strong moves followed by significant errors [10][13]. - Gemini 2.5 Pro achieved a 4-0 victory over Claude 4 Opus, but its true strength remains uncertain due to the opponent's mistakes [14][18]. - Grok 4's performance was particularly impressive, winning 4-0 against Gemini 2.5 Flash, demonstrating a strong ability to capture unprotected pieces [21][27]. Group 3 - The article notes that current AI models in chess exhibit three main weaknesses: insufficient global board visualization, limited understanding of piece interactions, and issues with executing legal moves [27]. - Grok 4's success suggests it may have overcome these limitations, raising questions about the consistency of these models' advantages and shortcomings in future matches [27]. - The article also mentions a poll where 37% of participants favored Gemini 2.5 Pro as the likely winner before the competition began [27].
谷歌约战,DeepSeek、Kimi都要上,首届大模型对抗赛明天开战
机器之心· 2025-08-05 04:09
Core Viewpoint - The upcoming AI chess competition aims to showcase the performance of various advanced AI models in a competitive setting, utilizing a new benchmark testing platform called Kaggle Game Arena [2][12]. Group 1: Competition Overview - The AI chess competition will take place from August 5 to 7, featuring eight cutting-edge AI models [2][3]. - The participating models include notable names such as OpenAI's o4-mini, Google's Gemini 2.5 Pro, and Anthropic's Claude Opus 4 [7]. - The event is organized by Google and aims to provide a transparent and rigorous testing environment for AI models [6][8]. Group 2: Competition Format - The competition will follow a single-elimination format, with each match consisting of four games. The first model to score two points advances [14]. - If a match ends in a tie (2-2), a tiebreaker game will be played, where the white side must win to progress [14]. - Models are restricted from using external tools like Stockfish and must generate legal moves independently [17]. Group 3: Evaluation and Transparency - The competition will ensure transparency by open-sourcing the game execution framework and environment [8]. - The performance of each model will be displayed on the Kaggle Benchmarks leaderboard, allowing real-time tracking of results [12][13]. - The event is designed to address the limitations of current AI benchmark tests, which struggle to keep pace with the rapid development of modern models [12].