最先进的AI大模型，为什么都在挑战《宝可梦》？

Core Insights - The article discusses the evolution of AI models using games as a testing ground, highlighting the recent achievement of Google's AI model Gemini 2.5 Pro in independently completing the original Pokémon game, which has reignited interest in AI capabilities [4][30]. Group 1: AI Development and Gaming - AI has been tested through games for nearly a decade, with notable milestones including AlphaGo's victory over human players in Go and DeepMind's success in games like DOTA2 and StarCraft II [2][3]. - The use of games as a benchmark for AI intelligence remains prevalent, as demonstrated by Gemini's recent accomplishment, which was celebrated by Google's CEO and DeepMind's head [4][5]. Group 2: Challenges in AI Learning - The Moravec's paradox suggests that tasks perceived as easy for humans can be significantly more challenging for AI, which is exemplified by Gemini's achievement in Pokémon [6][7]. - The process of AI learning in games like Pokémon is complex, requiring the AI to develop its own understanding and strategies without predefined rules or guidance [16][17]. Group 3: Comparison of AI Models - Anthropic's Claude 3.7 struggled to progress in Pokémon, achieving only three badges after a year of iterations, while Gemini completed the game with approximately 106,000 actions, significantly fewer than Claude's 215,000 actions [11][30]. - The differences in performance between Claude and Gemini are attributed to their respective frameworks, with Gemini's agent harness providing better input processing and decision-making capabilities [34][35]. Group 4: Implications for AI Research - The ability of AI to navigate and complete games like Pokémon indicates its potential for independent learning and problem-solving in real-world scenarios [37][38]. - The choice of Pokémon as a training ground reflects the game's themes of growth, choice, and adventure, paralleling the journey of AI in understanding complex rules and environments [39][40].