Kimi k2

Search documents
首届大模型象棋争霸赛:Grok 4与o3挺进决赛,DeepSeek、Kimi落败
3 6 Ke· 2025-08-07 06:16
Core Insights - The AI chess tournament hosted on Kaggle featured eight large language models (LLMs) competing in a knockout format, with Grok 4 and o3 advancing to the finals after defeating Gemini 2.5 Pro and o4-mini respectively [1][3][8] Group 1: Tournament Structure and Results - The tournament lasted three days and involved eight AI models, including Grok 4 (xAI), Gemini 2.5 Pro (Google), o4-mini (OpenAI), o3 (OpenAI), Claude 4 Opus (Anthropic), Gemini 2.5 Flash (Google), DeepSeek R1 (DeepSeek), and Kimi k2 (Moonshot AI) [1] - The competition utilized a single-elimination format where each AI had up to four attempts to make a legal move; failure to do so resulted in an immediate loss [1] - On the first day, Grok 4, o3, Gemini 2.5 Pro, and o4-mini all achieved 4-0 victories, advancing to the semifinals [3][11][22] Group 2: Semifinal Highlights - In the semifinals, o3 demonstrated a dominant performance, winning 4-0 against o4-mini, showcasing a high level of precision with a perfect accuracy score of 100 in one of the games [5] - The match between Grok 4 and Gemini 2.5 Pro ended in a tie after regular play, leading to an Armageddon tiebreaker where Grok 4 emerged victorious [8] - The semifinals highlighted the strengths and weaknesses of the AI models, with Grok 4 overcoming early mistakes to secure its place in the finals [8][19] Group 3: Performance Analysis - The tournament revealed that while some AI models performed exceptionally well, others struggled with basic tactical sequences and context understanding, indicating areas for improvement in AI chess capabilities [22] - The performance of Grok 4 attracted attention from industry figures, including Elon Musk, who commented on its impressive gameplay [19]
闹玩呢!首届大模型对抗赛,DeepSeek、Kimi第一轮被淘汰了
机器之心· 2025-08-06 04:31
机器之心报道 机器之心编辑部 从目前战况来看,Grok 4 是夺冠热门。 在玩游戏方面,到底哪个模型最厉害?为了回答这个问题,谷歌近日发起了首届大模型国际象棋对抗赛。 这场比赛为期三天,参赛选手包括: 刚刚,我们拿到了第一轮比赛的结果:Gemini 2.5 Pro、o4-mini、Grok 4 和 o3 均以 4-0 的战绩分别击败 Claude 4 Opus、DeepSeek R1、Gemini 2.5 Flash 和 Kimi k2,晋级半决赛。 以下是模型对阵图。 这个比赛是在一个名叫「Kaggle Game Arena」的平台上进行的。这是 Kaggle 公司的一个新项目,旨在跳出平时的基准测试框架,探索像 Gemini、DeepSeek 等 LLM 在动态和竞争环境中表现如何。 在昨天的报道中,我们详细描述了这场比赛的规则,比如不允许模型调用 Stockfish 等国际象棋引擎。(详情请参见《 谷歌约战,DeepSeek、Kimi 都要上,首届大 模型对抗赛明天开战 》) 以下是对战的详细信息: Kimi k2 对阵 o3:0-4 Kimi k2 与 o3 的对局较早结束,四局比赛都在八步棋内完成。 ...