AI真的能干活吗？硅谷用一场真实打工实验，给出了尴尬的答案

Core Insights - The experiment conducted by Scale AI, named "Remote Labor Index" (RLI), aimed to evaluate whether AI models can effectively perform freelance tasks, revealing a stark contrast between AI's theoretical capabilities and practical performance [1][3][20] - The results showed a dismal success rate of only 2.5% for the best-performing model, Manus, which completed just 6 out of 240 tasks, earning $1,720, significantly below the average earnings of human freelancers [1][8][21] Group 1: Experiment Overview - The RLI was designed to measure AI's ability to complete real-world tasks by using actual freelance projects from Upwork, totaling 240 tasks equivalent to 6,000 hours of human work, with a total potential payout of $144,000 [6][20] - The tasks were selected to be independent and complete, focusing on areas such as writing, 3D modeling, video animation, architectural design, and game development, excluding tasks requiring ongoing communication or teamwork [4][6] Group 2: Performance Metrics - The overall automation rate for the AI models was below 3%, with the best model, Manus, achieving a success rate of 2.5%, while other models like Grok 4 and Claude Sonnet 4.5 had rates of 2.1%, and GPT-5 at 1.7% [8][10] - The primary reasons for failure included low quality (45.6%), incompleteness or formatting errors (35.7%), technical issues (17.6%), and severe visual or logical inconsistencies (14.8%) [10][12] Group 3: Implications for AI in the Workforce - The findings indicate that while AI can generate content quickly, the quality often fails to meet professional standards, with human workers completing projects in an average of 28.9 hours compared to AI's equivalent computational time yielding mostly unqualified results [14][21] - The RLI suggests a trend where work is being "disaggregated" rather than directly replaced by AI, with lower-level tasks (L1-L2) showing a higher success rate of 25%-30%, while more complex tasks (L4-L5) had rates below 5% [15][21] Group 4: Future Outlook - The research team plans to continuously update the RLI to include new dimensions such as multimodal capabilities and long-term memory, aiming to convert model capabilities into measurable economic value [16][20] - The introduction of AI is reshaping job structures, with a noted 7.7% decline in entry-level job postings in sectors like retail and administrative support, indicating a shift in required skills towards those that integrate AI effectively [22][23]