X @Demis Hassabis
Gemini 2.5 Deep Think is state-of-the-art performance across many challenging benchmarks!Google DeepMind (@GoogleDeepMind):Compared to other models without tool use, it achieves state-of-the-art performance across:🔘 LiveCodeBench V6, which evaluates competitive code performance🔘 Humanity’s Last Exam, a challenging benchmark that measures a model’s expertise in different domains, including science https://t.co/cgTKi8JYlV ...