AI跑分越来越没意义，谷歌说不如让AI一起玩游戏

Group 1 - Google has organized an "AI Chess King Championship" featuring top AI models from the US and China, including OpenAI's o4-mini and Google's Gemini 2.5 Pro, to evaluate and promote advancements in AI's reasoning and decision-making capabilities [1][3] - The competition aims to address the limitations of traditional AI benchmark tests, which have failed to keep pace with the rapid development of AI models, by utilizing strategy games as a testing ground [3][11] - The Kaggle Game Arena platform, introduced by Google, serves as a new public benchmark testing platform that allows AI models to compete in a more dynamic and realistic environment compared to conventional tests [3][11] Group 2 - The current investment climate has led to a phenomenon where AI startups can easily achieve valuations exceeding $1 billion, driven by a fear of missing out (FOMO) among investors [4][6] - There is a growing trend of "score manipulation" among AI companies, where high benchmark scores are used as a marketing tool to attract investment, leading to concerns about the integrity of AI performance evaluations [6][9] - Various benchmark tests exist to evaluate AI models, but their lack of flexibility has created opportunities for companies to artificially inflate their scores, undermining the reliability of these assessments [9][11] Group 3 - Google has chosen games as a testing scenario for AI models due to their structured rules and inherent randomness, which effectively measure AI intelligence and capabilities [12][13] - The relationship between gaming and AI is significant, as demonstrated by OpenAI's success in defeating human champions in games like DOTA2, showcasing AI's potential in complex environments [13][15] - The transition to reinforcement learning based on human feedback (RLHF) has been pivotal in enhancing AI's performance, as seen in OpenAI's development of ChatGPT [15]