Workflow
Kimi k2
icon
Search documents
DeepSeek、Kimi 首轮淘汰,马斯克 Grok 4 杀进决赛,首届全球 AI 对抗赛连爆冷门
3 6 Ke· 2025-08-07 08:27
Core Insights - The AI chess championship, hosted by Kaggle, featured eight leading language models competing in a three-day tournament to evaluate their strategic reasoning abilities [8][9] - The final match will see OpenAI's o3 face off against xAI's Grok 4, reflecting the rivalry between Elon Musk and the Ultraman character [19] Group 1: Tournament Structure and Participants - The tournament lasted three days from August 5 to 7, with the first day determining the top four competitors [3] - Eight AI models participated, including Claude Opus 4, DeepSeek-R1, Gemini 2.5 Pro, Gemini 2.5 Flash, Kimi K2, o3, o4-mini, and Grok 4 [3][8] Group 2: Competition Rules and Format - The competition utilized the "Chess-Text Harness" rule system, which tested the models' pure reasoning capabilities without external tools or pre-defined legal moves [9] - Each model had a 60-minute time limit per move, and they could only interpret the chessboard through text symbols [9] Group 3: Match Highlights - In the semifinals, o3 decisively defeated o4 mini with a score of 4:0, showcasing its superior tactical skills [11] - Grok 4 faced Gemini 2.5 Pro in a closely contested match, with Grok initially making significant errors but later recovering to win crucial games [13][17] - The final match will employ an "Armageddon" format, where Grok, playing black, can win with a draw, balancing the inherent advantage of white [19][22]
首届大模型象棋争霸赛:Grok 4与o3挺进决赛,DeepSeek、Kimi落败
3 6 Ke· 2025-08-07 06:16
Core Insights - The AI chess tournament hosted on Kaggle featured eight large language models (LLMs) competing in a knockout format, with Grok 4 and o3 advancing to the finals after defeating Gemini 2.5 Pro and o4-mini respectively [1][3][8] Group 1: Tournament Structure and Results - The tournament lasted three days and involved eight AI models, including Grok 4 (xAI), Gemini 2.5 Pro (Google), o4-mini (OpenAI), o3 (OpenAI), Claude 4 Opus (Anthropic), Gemini 2.5 Flash (Google), DeepSeek R1 (DeepSeek), and Kimi k2 (Moonshot AI) [1] - The competition utilized a single-elimination format where each AI had up to four attempts to make a legal move; failure to do so resulted in an immediate loss [1] - On the first day, Grok 4, o3, Gemini 2.5 Pro, and o4-mini all achieved 4-0 victories, advancing to the semifinals [3][11][22] Group 2: Semifinal Highlights - In the semifinals, o3 demonstrated a dominant performance, winning 4-0 against o4-mini, showcasing a high level of precision with a perfect accuracy score of 100 in one of the games [5] - The match between Grok 4 and Gemini 2.5 Pro ended in a tie after regular play, leading to an Armageddon tiebreaker where Grok 4 emerged victorious [8] - The semifinals highlighted the strengths and weaknesses of the AI models, with Grok 4 overcoming early mistakes to secure its place in the finals [8][19] Group 3: Performance Analysis - The tournament revealed that while some AI models performed exceptionally well, others struggled with basic tactical sequences and context understanding, indicating areas for improvement in AI chess capabilities [22] - The performance of Grok 4 attracted attention from industry figures, including Elon Musk, who commented on its impressive gameplay [19]
闹玩呢!首届大模型对抗赛,DeepSeek、Kimi第一轮被淘汰了
机器之心· 2025-08-06 04:31
Core Viewpoint - The article discusses the results of the first large model chess competition organized by Google, highlighting the performance of various AI models, particularly Grok 4, which emerged as a strong contender with a perfect record [2][30]. Group 1: Competition Overview - The chess competition lasted three days and featured models such as Gemini 2.5 Pro, o4-mini, Grok 4, and o3, all achieving a 4-0 victory in the first round [4]. - The competition was held on the Kaggle Game Arena platform, aiming to evaluate the performance of large language models (LLMs) in dynamic and competitive environments [6]. Group 2: Match Results - Kimi k2 lost to o3 with a score of 0-4, failing to make legal moves in all four games [7][8]. - o4-mini defeated DeepSeek R1 with a score of 4-0, showcasing a decline in game quality after a few strong opening moves [18][21]. - Gemini 2.5 Pro won against Claude 4 Opus with a score of 4-0, although its true strength remains uncertain due to Claude's mistakes [23][24]. - Grok 4 achieved a perfect score of 4-0 against Gemini 2.5 Flash, demonstrating superior chess skills and the ability to capitalize on unprotected pieces [30][33]. Group 3: Key Observations - The competition revealed three main weaknesses in current AI models: insufficient global board visualization, limited understanding of piece interactions, and issues executing legal moves [36]. - Grok 4's performance suggests it may have overcome these limitations, raising questions about the stability of these advantages in future matches [36]. Group 4: Audience Engagement - A poll conducted prior to the competition indicated that 37% of participants favored Gemini 2.5 Pro as the likely winner, with Grok 4 receiving 7.04% of the votes [37][38].