Core Insights - The article highlights the impressive performance of Gemini 3 Deep Think, which has achieved significant milestones in various benchmark tests, showcasing its advanced reasoning capabilities and potential applications in scientific and engineering fields [3][15][19]. Group 1: Performance Metrics - Gemini 3 Deep Think scored an unprecedented 84.6% on the ARC-AGI-2 benchmark, surpassing previous models that scored between 60%-70% [3][19]. - In the Humanity's Last Exam (HLE), it achieved a new state-of-the-art (SOTA) score of 48.4% [3][15]. - The model also reached a remarkable Elo score of 3455 on Codeforces, ranking it as the 8th best globally, with only 7 individuals scoring higher [1][15]. Group 2: Cost Efficiency - The upgrade of Gemini 3 Deep Think has led to an 82% reduction in reasoning costs, decreasing from $77.16 to $13.62 per task [21][15]. Group 3: Applications and Innovations - Gemini 3 Deep Think has demonstrated capabilities in analyzing sketches, modeling complex shapes, and generating files for 3D printing, indicating its utility in engineering tasks [7][15]. - The model successfully identified a subtle logical flaw in a complex mathematical paper, which had been overlooked in prior peer reviews, showcasing its potential in academic research [9][15]. - It optimized a method for growing complex crystals, achieving a precision previously unattainable, which could lead to new semiconductor materials [10][15]. Group 4: Team and Development - The development team of Gemini 3 Deep Think includes notable figures such as Yi Tay and Shunyu Yao, both of whom have significant backgrounds in AI and physics [27][28].
姚顺宇谷歌首秀,Gemini新模型刷爆SOTA:人类仅剩7人捍卫碳基编程