Workflow
刚刚,大模型棋王诞生,40轮血战,OpenAI o3豪夺第一,人类大师地位不保?
3 6 Ke·2025-08-22 11:51

Core Insights - The recent chess rating competition results have been released, showcasing the performance of various AI models, with OpenAI's o3 achieving a leading human-equivalent Elo rating of 1685, followed by Grok 4 and Gemini 2.5 Pro [1][2][3]. Group 1: Competition Overview - The competition involved 40 rounds of matches where AI models competed using only text input, without tools or validators, to establish a ranking similar to that of other strategic games like Go [1][8]. - The results were derived from a round-robin format where each model faced off in 40 matches, consisting of 20 games as white and 20 as black [11][10]. Group 2: Model Rankings - The final rankings are as follows: 1. OpenAI o3 with an estimated human Elo of 1685 2. Grok 4 with an estimated human Elo of 1395 3. Gemini 2.5 Pro with an estimated human Elo of 1343 [3][4][5]. - DeepSeek R1, GPT-4.1, Claude Sonnet-4, and Claude Opus-4 are tied for fifth place, with estimated human Elos ranging from 664 to 759 [5][4]. Group 3: Methodology and Evaluation - The Elo scores were calculated using the Bradley-Terry algorithm based on the match results between models [12]. - The estimated human Elo ratings were derived through linear interpolation against various levels of the Stockfish chess engine, which has a significantly higher rating of 3644 [13][14]. Group 4: Future Developments - Kaggle plans to regularly update the chess text leaderboard and introduce more games to provide a comprehensive evaluation of AI models' strategic reasoning and cognitive abilities [24][22].