Workflow
OpenAI研究大模型对GDP贡献,三大行业已能代替人类,并自曝不敌Claude
机器之心·2025-09-27 06:13

Core Viewpoint - The article discusses the introduction of GDPval, a new evaluation method by OpenAI that assesses AI model performance on economically valuable real-world tasks, indicating that AI is nearing human-level performance in various industries [1][3][22]. Group 1: Evaluation Methodology - GDPval uses GDP as a key economic indicator and extracts tasks from critical occupations in the top nine industries contributing to the GDP [3][16]. - The evaluation includes 1,320 professional tasks, with a golden open-source subset of 220 tasks, designed and reviewed by experienced professionals [18][22]. - Tasks are based on real work outcomes, ensuring the evaluation's realism and diversity compared to other benchmarks [18][19]. Group 2: Model Performance - The evaluation results show that leading models like Claude Opus 4.1 and GPT-5 are approaching or matching the quality of human experts in various tasks [4][9]. - Claude Opus 4.1 excels in aesthetic tasks, while GPT-5 performs better in accuracy-related tasks [9][10]. - Performance improvements have been significant, with task completion speed being approximately 100 times faster and costs being 100 times lower than human experts [13]. Group 3: Industry Impact - AI has reached or surpassed human-level capabilities in sectors such as government, retail, and wholesale [7]. - The early results from GDPval suggest that AI can complete some repetitive tasks faster and at a lower cost than human experts, potentially transforming the job market [21]. - OpenAI aims to democratize access to these tools, enabling workers to adapt to changes and fostering economic growth through AI integration [21]. Group 4: Future Developments - OpenAI plans to expand GDPval to include more occupations, industries, and task types, enhancing interactivity and addressing more ambiguous tasks [22]. - The ongoing improvements in the evaluation method indicate a commitment to better measure the progress of diverse knowledge work [22].