Workflow
宝可梦游戏作为基准测试
icon
Search documents
GPT-5通关《宝可梦水晶》创纪录,9517步击败赤爷,效率碾压o3三倍
3 6 Ke· 2025-08-27 06:19
Core Insights - GPT-5 has demonstrated superior efficiency in completing the game "Pokémon Crystal," defeating the final boss, Red, in just 9517 steps, significantly fewer than the 27040 steps taken by the previous model, o3 [3][5][11] - The performance of GPT-5 has garnered attention and praise, including acknowledgment from OpenAI's president, Greg Brockman [11] Performance Comparison - GPT-5 completed the main storyline of "Pokémon Crystal" with a total of 9205 steps to collect all 16 badges, while o3 required 22334 steps [5] - In the Elite Four and Champion segment, GPT-5 used only 7329 steps compared to o3's 18115 steps, showcasing a more than twofold efficiency [8] - Overall, GPT-5's total steps to defeat Red were about one-third of o3's, highlighting a significant improvement in gameplay efficiency [3][11] Gameplay Mechanics - GPT-5's success is attributed to fewer "hallucinations," better spatial reasoning, and superior goal planning compared to o3, allowing it to navigate the game world more effectively [14][15] - The model's ability to plan long action sequences with minimal errors has contributed to its rapid progress through the game [15] Benchmarking and Cost - The use of Pokémon games as benchmarks for AI models is noted, with GPT-5's completion of "Pokémon Red" costing approximately $3500 in API credits, indicating the high expense associated with such testing [23] - The integration of various tools and strategies, such as creating a mini-map and self-critique mechanisms, enhances the model's decision-making capabilities in the game [21][25]