GPT-5.4发布：OpenAI首个大一统模型，简直是龙虾原生

Core Viewpoint - GPT-5.4 represents a significant advancement in AI models, integrating reasoning, coding, computer use, deep web search, and a million-token context into a single model without sacrificing performance in any area [1][2][3]. Group 1: Model Capabilities - GPT-5.4 maintains leading performance across multiple key benchmark tests, emphasizing its enhanced capabilities [2]. - The model has achieved an 83.0% score in GDPval knowledge work tasks, indicating its ability to perform at par with professional workers [22][23]. - In the OSWorld-Verified benchmark, GPT-5.4 scored 75.0%, surpassing the human average of 72.4% [39]. Group 2: Efficiency Improvements - Compared to GPT-5.2, GPT-5.4 has significantly reduced the number of tokens used during reasoning, leading to faster response times and lower overall costs [6][7][8]. - The introduction of a tool search mechanism has reduced total token usage by 47% while maintaining accuracy, making the model more cost-effective for businesses [81][94]. Group 3: New Features - GPT-5.4 is the first model to natively support computer operations, allowing it to understand software interfaces through screenshots and execute tasks like sending emails and filling forms [35][36]. - The model's performance in browser tasks has improved, achieving a 67.3% success rate in WebArena tests, higher than GPT-5.2's 65.4% [37]. - In the SWE-Bench Pro test, GPT-5.4 scored 57.7%, slightly above GPT-5.3-Codex's 56.8%, with lower latency [46]. Group 4: Visual and Document Processing - GPT-5.4 has enhanced visual capabilities, achieving an 81.2% accuracy in MMMU-Pro visual reasoning tests, surpassing GPT-5.2's 79.5% [73]. - The model's ability to create and edit spreadsheets has improved, with accuracy rising from 68.4% to 87.3% [70]. - In document parsing, the average error rate has decreased from 0.140 to 0.109, indicating a significant reduction in factual errors [78][80]. Group 5: Market Positioning and Pricing - GPT-5.4's API pricing is higher than GPT-5.2, with costs of $2.5 per million tokens for input and $15 for output, reflecting its positioning as a premium product for professional use [86][88]. - Despite the higher pricing, the model's efficiency improvements may offset costs for users engaged in complex tasks [90][91].