Workflow
闹玩呢,首届大模型对抗赛,DeepSeek、Kimi第一轮被淘汰了
3 6 Ke·2025-08-06 08:01

Group 1 - The core focus of the article is the first international chess competition for large models, where Grok 4 is highlighted as a leading contender for the championship [1][24]. - The competition features various AI models, including Gemini 2.5 Pro, o4-mini, Grok 4, and others, all of which advanced to the semifinals with a 4-0 victory in their initial matches [1][9]. - The event is hosted on the Kaggle Game Arena platform, aiming to evaluate the performance of large language models (LLMs) in dynamic and competitive environments [1]. Group 2 - Kimi k2 faced o3 and lost 0-4, with Kimi k2 struggling to find legal moves after the opening phase, indicating potential technical issues [3][6]. - DeepSeek R1 lost to o4-mini with a score of 0-4, showcasing a pattern of initial strong moves followed by significant errors [10][13]. - Gemini 2.5 Pro achieved a 4-0 victory over Claude 4 Opus, but its true strength remains uncertain due to the opponent's mistakes [14][18]. - Grok 4's performance was particularly impressive, winning 4-0 against Gemini 2.5 Flash, demonstrating a strong ability to capture unprotected pieces [21][27]. Group 3 - The article notes that current AI models in chess exhibit three main weaknesses: insufficient global board visualization, limited understanding of piece interactions, and issues with executing legal moves [27]. - Grok 4's success suggests it may have overcome these limitations, raising questions about the consistency of these models' advantages and shortcomings in future matches [27]. - The article also mentions a poll where 37% of participants favored Gemini 2.5 Pro as the likely winner before the competition began [27].