Core Insights - The article discusses the recent releases of advanced AI models by Anthropic and OpenAI, specifically Claude Opus 4.6 and GPT-5.3-Codex, highlighting their significant improvements in performance and capabilities [2][15]. Summary of Claude Opus 4.6 - Claude Opus 4.6 represents a major upgrade for Anthropic's flagship AI model, featuring a more cautious planning approach and the ability to maintain longer autonomous workflows [5]. - The model introduces a context window of 1 million tokens, allowing it to process and reason with significantly more information than previous versions [6]. - It includes a "smart agent team" feature, enabling multiple AI agents to work on different aspects of coding projects simultaneously [6]. - Opus 4.6 outperformed competitors in various assessments, achieving the highest scores in Terminal-Bench 2.0 and leading in the "Humanity's Last Exam" [7]. - In GDPval-AA, Opus 4.6 scored approximately 144 Elo points higher than OpenAI's GPT-5.2 and 190 points higher than its predecessor, Claude Opus 4.5 [7]. - The model's performance in MRCR v2 testing showed a score of 76%, significantly higher than Sonnet 4.5's 18.5%, indicating a qualitative leap in context utilization [9]. Summary of GPT-5.3-Codex - OpenAI's GPT-5.3-Codex claims to have the best coding performance to date, achieving record scores in multiple benchmarks, including 56.8% in SWE-Bench Pro and 77.3% in Terminal-Bench 2.0 [16][19]. - The model integrates the advanced coding capabilities of GPT-5.2-Codex with enhanced reasoning and expertise from GPT-5.2, resulting in a 25% speed improvement [19][20]. - GPT-5.3-Codex is designed to function as a comprehensive work assistant, capable of handling tasks across the software lifecycle, including debugging, deployment, and user research [25]. - The model allows for real-time interaction, enabling users to guide and supervise multiple working agents without losing context [27]. - OpenAI emphasizes that the advancements in GPT-5.3-Codex have fundamentally changed the workflow of their research and engineering teams, enhancing productivity and interaction quality [28][29]. Conclusion - The article concludes that the competitive landscape of AI models is intensifying, with both Anthropic and OpenAI making significant strides in capabilities and performance, setting the stage for further developments in the industry [31].
硬碰硬!刚刚,Claude Opus 4.6与GPT-5.3-Codex同时发布
机器之心·2026-02-05 23:45