GPT Codex 5.3
Search documents
姚顺宇谷歌首秀,Gemini新模型刷爆SOTA:人类仅剩7人捍卫碳基编程
3 6 Ke· 2026-02-13 07:32
Core Insights - The article highlights the impressive performance of Gemini 3 Deep Think, which has achieved significant milestones in various benchmark tests, showcasing its advanced reasoning capabilities and potential applications in scientific and engineering fields [3][15][19]. Group 1: Performance Metrics - Gemini 3 Deep Think scored an unprecedented 84.6% on the ARC-AGI-2 benchmark, surpassing previous models that scored between 60%-70% [3][19]. - In the Humanity's Last Exam (HLE), it achieved a new state-of-the-art (SOTA) score of 48.4% [3][15]. - The model also reached a remarkable Elo score of 3455 on Codeforces, ranking it as the 8th best globally, with only 7 individuals scoring higher [1][15]. Group 2: Cost Efficiency - The upgrade of Gemini 3 Deep Think has led to an 82% reduction in reasoning costs, decreasing from $77.16 to $13.62 per task [21][15]. Group 3: Applications and Innovations - Gemini 3 Deep Think has demonstrated capabilities in analyzing sketches, modeling complex shapes, and generating files for 3D printing, indicating its utility in engineering tasks [7][15]. - The model successfully identified a subtle logical flaw in a complex mathematical paper, which had been overlooked in prior peer reviews, showcasing its potential in academic research [9][15]. - It optimized a method for growing complex crystals, achieving a precision previously unattainable, which could lead to new semiconductor materials [10][15]. Group 4: Team and Development - The development team of Gemini 3 Deep Think includes notable figures such as Yi Tay and Shunyu Yao, both of whom have significant backgrounds in AI and physics [27][28].
姚顺宇谷歌首秀,Gemini新模型刷爆SOTA:人类仅剩7人捍卫碳基编程
量子位· 2026-02-13 05:42
Core Insights - Google has significantly upgraded its AI model, Gemini 3 Deep Think, in response to competition from Claude Opus 4.6 and GPT Codex 5.3 [1] Performance Metrics - Gemini 3 Deep Think achieved an unprecedented score of 84.6% on the ARC-AGI-2 benchmark, surpassing previous models that scored between 60%-70% [3][26] - In the Humanity's Last Exam (HLE), it scored 48.4%, setting a new state-of-the-art (SOTA) [4][22] - The model also scored 3455 Elo points on Codeforces, ranking it as the 8th in the world [2] - In the International Math Olympiad 2025, it reached gold medal level with a score of 81.5% [5][33] Cost Efficiency - The upgrade has reduced the reasoning cost by 82%, from $77.16 to $13.62 per task [29] Applications and Capabilities - Gemini 3 Deep Think can analyze sketches, model complex shapes, and generate files for 3D printing [8] - It successfully identified a subtle logical flaw in a complex mathematical paper that was missed during human peer review [10][11] - The model optimized a method for growing complex crystals, achieving a thickness greater than 100 microns, which was previously difficult [14] Research and Development Team - The development team includes notable Chinese scientists, such as Yi Tay and Shunyu Yao, who have significant backgrounds in AI and physics [36][41] - Yi Tay has previously worked on early large language models and returned to Google DeepMind after a stint in a startup [38] - Shunyu Yao has a strong academic background, having published in top journals and worked on advanced topics in quantum physics [41][42]