姚顺宇谷歌首秀，Gemini新模型刷爆SOTA：人类仅剩7人捍卫碳基编程

Core Insights - Google has significantly upgraded its AI model, Gemini 3 Deep Think, in response to competition from Claude Opus 4.6 and GPT Codex 5.3 [1] Performance Metrics - Gemini 3 Deep Think achieved an unprecedented score of 84.6% on the ARC-AGI-2 benchmark, surpassing previous models that scored between 60%-70% [3][26] - In the Humanity's Last Exam (HLE), it scored 48.4%, setting a new state-of-the-art (SOTA) [4][22] - The model also scored 3455 Elo points on Codeforces, ranking it as the 8th in the world [2] - In the International Math Olympiad 2025, it reached gold medal level with a score of 81.5% [5][33] Cost Efficiency - The upgrade has reduced the reasoning cost by 82%, from $77.16 to $13.62 per task [29] Applications and Capabilities - Gemini 3 Deep Think can analyze sketches, model complex shapes, and generate files for 3D printing [8] - It successfully identified a subtle logical flaw in a complex mathematical paper that was missed during human peer review [10][11] - The model optimized a method for growing complex crystals, achieving a thickness greater than 100 microns, which was previously difficult [14] Research and Development Team - The development team includes notable Chinese scientists, such as Yi Tay and Shunyu Yao, who have significant backgrounds in AI and physics [36][41] - Yi Tay has previously worked on early large language models and returned to Google DeepMind after a stint in a startup [38] - Shunyu Yao has a strong academic background, having published in top journals and worked on advanced topics in quantum physics [41][42]