OpenAI 深夜发布 ChatGPT Agent：对标Manus、硬刚 Grok 4

Core Insights - OpenAI has launched the ChatGPT Agent, which integrates "Operator" and "Deep Research" capabilities to overcome limitations of previous models [2][3] - The ChatGPT Agent features various tools such as graphical browsers and command line terminals, allowing for comprehensive understanding and interaction with web information [2][3] - Performance tests show ChatGPT Agent achieving competitive scores in various benchmarks, indicating its advanced capabilities in data analysis and modeling [5][6] Group 1: Product Features - ChatGPT Agent combines web search intelligence and deep research capabilities, addressing the shortcomings of earlier versions [2] - It includes tools for graphical browsing, text browsing, command line operations, and API calls, enhancing its ability to gather and analyze information [2] - Users can interact with the agent through their email and GitHub accounts, allowing for personalized responses and deeper research [2][3] Group 2: Performance Metrics - In the HLE benchmark test, ChatGPT achieved a score of 44.4%, matching Grok 4, while in the FrontierMath test, it outperformed competitors by 8% [5] - The DSBench test revealed a 25% and 20% advantage in data analysis and modeling over human experts, respectively [6] - However, the agent's performance in spreadsheet tasks was only 45% correct, significantly lower than the 71% accuracy of human experts, indicating limitations in complex logical tasks [6] Group 3: Market Trends - The financial sector is becoming a focal point for AI companies, as evidenced by the successful completion of 71.3% of entry-level tasks by ChatGPT Agent in investment banking modeling tests [7] - The competitive landscape is intensifying, with both OpenAI and Anthropic targeting financial applications for their AI agents [8] - The market for AI agents is becoming crowded, with various companies exploring automation in daily tasks and enhancing human-machine interaction [8]