Core Viewpoint - OpenAI has launched the ChatGPT Agent, marking its entry into the "agentic AI" field, allowing the AI assistant to perform multi-step tasks autonomously while maintaining user control [1][3]. Group 1: Features and Capabilities - The ChatGPT Agent integrates previous tools and capabilities, enabling it to browse the web, run code, and create documents, while requiring user permission for actions with real-world consequences [1][2]. - Users can view all operations performed by the Agent in a private sandbox environment, which includes a virtual operating system and web browser [2]. - The Agent can handle various tasks such as outfit shopping, creating PowerPoint presentations, meal planning, and updating financial spreadsheets, utilizing web browsing, terminal access, and API connections [2]. Group 2: Performance Evaluation - In benchmark tests, the ChatGPT Agent achieved advanced performance, with a 41.6% accuracy rate in the "Humanity's Last Exam" and 27.4% in the "FrontierMath" test, outperforming previous models [7]. - The Agent scored 89.9% in data analysis tasks and 85.5% in data modeling tasks, surpassing human performance [7][8]. - Users reported that the Agent could generate financial analysis reports quickly, although it still lags behind entry-level investment banking analysts in some calculations [8]. Group 3: Limitations and User Feedback - Despite its capabilities, the ChatGPT Agent's performance can vary significantly based on specific tasks, with some users noting it performed poorly in certain benchmarks compared to previous models [12][13]. - Users have pointed out inaccuracies in data analysis tasks, indicating that the Agent may struggle with complex problem-solving beyond its training data [15][18]. - Comparisons with other AI products, such as Genspark and Manus, suggest that these alternatives may outperform ChatGPT Agent in specific tasks, raising questions about its competitive edge [21][22].
OpenAI新Agent遭中国24人初创团队碾压!实测成本、质量全输惨,海外用户:中国Agent代差领先
AI前线·2025-07-18 06:00