Workflow
time horizon
icon
Search documents
7个月翻一番!AI agent能力飙升,METR报告揭示指数级进化规律
量子位· 2025-07-16 01:49
Core Insights - The report from METR indicates that the capabilities of AI agents are doubling approximately every seven months, a trend validated across nine benchmark tests [1][7][14] - The concept of "time horizon" is introduced, representing the duration over which an agent can reliably complete tasks, with longer time horizons indicating higher intelligence levels [8][9] Group 1: Agent Performance and Growth - AI agents have shown significant improvements in various tasks, completing tasks that would take humans 50-200 minutes in a fraction of that time, with performance doubling every 2-6 months [3][4] - The growth rate in performance for tasks related to software development and mathematical competitions aligns with the overall trend, while performance in autonomous driving tasks is slower, doubling approximately every 20 months [5][14] - The report expands the evaluation of AI capabilities beyond software development to include a variety of tasks, confirming the exponential growth pattern across different domains [9][20] Group 2: Benchmarking and Time Horizon - METR utilized nine benchmarks to assess the time horizon of AI agents, including software development, computer usage, math contests, competitive programming, scientific Q&A, video understanding, autonomous driving, and robotics [9][10] - The time horizon varies significantly across benchmarks, with some tasks requiring longer durations than others, indicating that the complexity of tasks influences agent performance [15][17] - The report highlights that the ability of agents to handle longer and more complex tasks is crucial, suggesting that if the doubling trend continues, AI may soon tackle tasks that currently take days or weeks [20]