Workflow
闹玩呢!首届大模型对抗赛,DeepSeek、Kimi第一轮被淘汰了
机器之心·2025-08-06 04:31

Core Viewpoint - The article discusses the results of the first large model chess competition organized by Google, highlighting the performance of various AI models, particularly Grok 4, which emerged as a strong contender with a perfect record [2][30]. Group 1: Competition Overview - The chess competition lasted three days and featured models such as Gemini 2.5 Pro, o4-mini, Grok 4, and o3, all achieving a 4-0 victory in the first round [4]. - The competition was held on the Kaggle Game Arena platform, aiming to evaluate the performance of large language models (LLMs) in dynamic and competitive environments [6]. Group 2: Match Results - Kimi k2 lost to o3 with a score of 0-4, failing to make legal moves in all four games [7][8]. - o4-mini defeated DeepSeek R1 with a score of 4-0, showcasing a decline in game quality after a few strong opening moves [18][21]. - Gemini 2.5 Pro won against Claude 4 Opus with a score of 4-0, although its true strength remains uncertain due to Claude's mistakes [23][24]. - Grok 4 achieved a perfect score of 4-0 against Gemini 2.5 Flash, demonstrating superior chess skills and the ability to capitalize on unprotected pieces [30][33]. Group 3: Key Observations - The competition revealed three main weaknesses in current AI models: insufficient global board visualization, limited understanding of piece interactions, and issues executing legal moves [36]. - Grok 4's performance suggests it may have overcome these limitations, raising questions about the stability of these advantages in future matches [36]. Group 4: Audience Engagement - A poll conducted prior to the competition indicated that 37% of participants favored Gemini 2.5 Pro as the likely winner, with Grok 4 receiving 7.04% of the votes [37][38].