Core Viewpoint - Google's Gemini 2.5 Pro has been launched as an upgraded version of its flagship model, maintaining its top position in the LMArena rankings, but developer feedback indicates a divide in actual application experiences [1][6]. Performance Metrics - Gemini 2.5 Pro achieved higher scores in multiple AI performance benchmarks, with an Elo score increase of 24 points, reaching a total of 1470 [1][2]. - In specific tests, Gemini 2.5 Pro outperformed OpenAI's models in areas such as GPQA and the "Humanity's Last Exam," scoring 21.6%, which is 1.3 percentage points higher than OpenAI's o3 [2][3]. Competitive Landscape - Despite high scores, there are concerns about the practical utility of Gemini 2.5 Pro, with some developers favoring Anthropic's Claude series for programming tasks [4][5]. - The competition among models is shifting from mere scoring to performance in specific application scenarios, with developers increasingly valuing real-world effectiveness [6][7]. Cost Efficiency - Gemini 2.5 Pro offers a more cost-effective pricing structure compared to OpenAI's o3 and Claude 4 Opus, with input costs at $1.25 and output costs at $10 per million tokens, while OpenAI's prices are significantly higher [6][7]. Developer Feedback - Developer experiences vary, with some reporting superior performance from Gemini 2.5 Pro in coding tasks, while others find Claude models to be more effective in specific programming scenarios [5][6].
谷歌新模型2.5 Pro霸榜AI竞技场,开发者评价两极分化