Workflow
通用智能体(Agent)
icon
Search documents
刚刚,OpenAI通用智能体ChatGPT Agent正式登场
机器之心· 2025-07-18 00:38
Core Viewpoint - The introduction of ChatGPT Agent marks a significant advancement in AI capabilities, enabling it to perform complex tasks autonomously and interactively, far beyond just answering questions [4][9][53]. Group 1: Product Features - ChatGPT Agent can utilize various tools to assist users in completing complex tasks, such as browsing calendars, generating editable presentations, and running code [6][12][19]. - The model achieved a score of 41.6% on the HLE benchmark, nearly doubling the performance of previous models [6][34]. - Users can access ChatGPT Agent through OpenAI Pro, Plus, and Team subscriptions, with specific usage limits based on the subscription type [7][8]. Group 2: Technical Capabilities - The core of this new capability is a unified agentic system that combines the strengths of previous breakthroughs, including web interaction and deep research capabilities [19][25]. - ChatGPT Agent can dynamically plan and choose tools to handle tasks, allowing it to switch between reasoning and execution seamlessly [20][28]. - It is equipped with a suite of tools, including a visual browser, text browser, terminal interface, and API access, enhancing its ability to gather and process information [26][27]. Group 3: Benchmark Performance - ChatGPT Agent demonstrated superior performance in various benchmark tests, including achieving a pass rate of 41.6% in the "Humanity's Last Exam" [34]. - In the FrontierMath benchmark, it reached an accuracy of 27.4%, outperforming previous models significantly [37]. - The model excelled in the SpreadsheetBench test, scoring 45.5% in real-world spreadsheet editing tasks, compared to 20.0% for Excel's Copilot [42][44]. Group 4: User Experience and Feedback - Early users reported that ChatGPT Agent could create comprehensive plans, such as retirement strategies, in a fraction of the time and cost compared to human advisors [58][60]. - Users have noted the agent's ability to autonomously complete tasks, such as online shopping, although some expressed that manual execution might be more efficient [63][67].