狂奔AGI,Claude年终封王,自主编码近5小时震惊全网
3 6 Ke·2025-12-22 02:02

Core Insights - The article highlights the impressive capabilities of Anthropic's programming model, Claude Opus 4.5, which has outperformed competitors like OpenAI's GPT-5.1-Codex-Max in coding tasks [1][3][4]. Group 1: Performance Metrics - Claude Opus 4.5 can autonomously code for up to 5 hours without crashing, showcasing significant advancements in AI coding agents [2]. - The 50% task completion time for Claude Opus 4.5 is approximately 4 hours and 49 minutes, which is the longest reported to date, while GPT-5.1-Codex-Max can complete tasks in 2 hours and 53 minutes [14]. - Despite its longer 50% task completion time, Opus 4.5's 80% task completion time is only 27 minutes, which is lower than GPT-5.1-Codex-Max's 32 minutes, indicating a smoother success rate curve for longer tasks [17][20]. Group 2: Future Projections - By 2026, AI agents are expected to independently complete a full human workday, with capabilities increasing to handle tasks equivalent to several months of human work by 2028 [13]. - The article suggests that the advancements in AI coding agents are accelerating, moving from minute-level tasks to hour-level tasks, indicating a significant leap in capabilities [9][10]. Group 3: Memory Challenges - The article identifies memory as the final barrier to achieving Artificial General Intelligence (AGI), emphasizing that current AI models lack the ability to retain long-term memory effectively [25][30]. - Current AI systems primarily rely on retrieval-based memory, which is insufficient for complex tasks, highlighting the need for a more sophisticated memory system that mimics human memory [33][35]. - The industry anticipates breakthroughs in memory systems within the next year, which could significantly enhance AI's learning capabilities and overall performance [40][41].