gemini3 流出版？

Core Insights - Gemini 3 Pro significantly outperforms its predecessor, Gemini 2.5 Pro, across various benchmarks, showcasing enhanced reasoning and multimodal capabilities [2]. Benchmark Performance - In the "Humanity's Last Exam" benchmark, Gemini 3 Pro achieved a score of 37.5%, compared to 21.6% for Gemini 2.5 Pro [2]. - For visual reasoning puzzles in the ARC-AGI-2 benchmark, Gemini 3 Pro scored 31.1%, while Gemini 2.5 Pro only managed 4.9% [2]. - In scientific knowledge assessment (GPQA Diamond), Gemini 3 Pro scored 91.9%, outperforming Gemini 2.5 Pro's 86.4% [2]. - In mathematics (AIME 2025), Gemini 3 Pro achieved 95.0%, while Gemini 2.5 Pro scored 88.0% [2]. - The MathArena Apex benchmark showed Gemini 3 Pro at 23.4%, a significant improvement over Gemini 2.5 Pro's 0.5% [2]. - For multimodal understanding (MMMU-Pro), Gemini 3 Pro scored 81.0%, compared to 68.0% for Gemini 2.5 Pro [2]. - In screen understanding (ScreenSpot-Pro), Gemini 3 Pro achieved 72.7%, while Gemini 2.5 Pro scored only 11.4% [2]. - The performance in OCR (OmniDocBench 1.5) showed Gemini 3 Pro with an edit distance of 0.115, better than Gemini 2.5 Pro's 0.145 [2]. - Knowledge acquisition from videos (Video-MMMU) resulted in 87.6% for Gemini 3 Pro, compared to 83.6% for Gemini 2.5 Pro [2]. - Competitive coding problems (LiveCodeBench Pro) saw Gemini 3 Pro with an Elo Rating of 2,439, significantly higher than Gemini 2.5 Pro's 1,775 [2]. - In agentic terminal coding (Terminal-Bench 2.0), Gemini 3 Pro scored 54.2%, while Gemini 2.5 Pro scored 32.6% [2]. - The agentic coding benchmark (SWE-Bench Verified) showed Gemini 3 Pro at 76.2%, compared to 59.6% for Gemini 2.5 Pro [2]. - For long-horizon agent tasks (Vending-Bench 2), Gemini 3 Pro's net worth was $5,478.16, vastly exceeding Gemini 2.5 Pro's $573.64 [2]. - In multilingual Q&A (MMMLU), Gemini 3 Pro scored 91.8%, slightly ahead of Gemini 2.5 Pro's 89.5% [2]. - The commonsense reasoning benchmark (Global PIQA) showed Gemini 3 Pro at 93.4%, compared to 91.5% for Gemini 2.5 Pro [2]. - Long context performance (MRCR v2) indicated Gemini 3 Pro at 77.0% for 128k context, significantly better than Gemini 2.5 Pro's 58.0% [2].