Workflow
决策规划能力
icon
Search documents
GPT-5通关《宝可梦水晶》创纪录!9517步击败赤爷,效率碾压o3三倍!
量子位· 2025-08-26 08:11
Core Viewpoint - GPT-5 has demonstrated exceptional performance in completing the game "Pokémon Crystal," defeating the final boss, Red, in significantly fewer steps compared to its predecessor, o3, showcasing advancements in AI capabilities and efficiency in gaming [1][3][21]. Summary by Sections Performance Comparison - GPT-5 completed "Pokémon Crystal" in just 9,517 steps, while o3 took 27,040 steps, indicating that GPT-5 was nearly three times more efficient [3][4]. - The average human player typically takes around 5 days (approximately 40 hours) to complete the game [5]. - In the main storyline, GPT-5 used only 9,205 steps to collect all 16 badges, compared to o3's 22,334 steps [10]. Efficiency in Gameplay - From badge collection to defeating Red, GPT-5 required only 312 steps, while o3 needed nearly 5,000 steps, demonstrating a speed increase of several times [11]. - During the Elite Four and Champion battles, GPT-5 used 7,329 steps, while o3 used over 18,115 steps, again highlighting GPT-5's superior efficiency [14]. AI Model Capabilities - The success of GPT-5 is attributed to its reduced "hallucination" rate, better spatial reasoning, and improved goal planning compared to o3 [21]. - GPT-5's ability to plan longer action sequences with minimal errors has significantly saved time during gameplay [21]. Benchmarking AI Models - The article discusses the trend of AI models, including Google's Gemini and Anthropic's Claude, attempting to play Pokémon games, with varying degrees of success [23][24]. - Pokémon games serve as a benchmark for evaluating AI models' contextual understanding, decision-making, and interface control capabilities [29]. Cost of AI Gaming - The cost of using GPT-5 for gaming is substantial, with estimates suggesting that completing "Pokémon Red" (which is half the length of "Pokémon Crystal") could cost around $3,500 [30]. - The article notes that unless one works at OpenAI, the financial barrier to using Pokémon as a benchmark for AI testing is significant [31].