Performance Benchmarks - Grok 4 achieves an Artificial Analysis Intelligence Index of 73, surpassing OpenAI o3 at 70, Google Gemini 2.5 Pro at 70, Anthropic Claude 4 Opus at 64, and DeepSeek R1 0528 at 68 [1] - Grok 4 leads in Coding Index (LiveCodeBench & SciCode) and Math Index (AIME24 & MATH-500) [5] - Grok 4 achieves an all-time high score of 88% in GPQA Diamond, exceeding Gemini 2.5 Pro's previous record of 84% [5] - Grok 4 achieves an all-time high score of 24% in Humanity's Last Exam, beating Gemini 2.5 Pro's previous all-time high score of 21% [5] - Grok 4 achieves joint highest score for MMLU-Pro and AIME 2024 of 87% and 94% respectively [5] Model Specifications & Pricing - Grok 4's pricing is $3/$15 per 1 million input/output tokens ($0.75 per 1 million cached input tokens), equivalent to Claude 4 Sonnet but more expensive than Gemini 2.5 Pro and o3 [4] - Grok 4 has a 256 thousand token context window, less than Gemini 2.5 Pro's 1 million tokens but more than Claude 4 Sonnet, Claude 4 Opus, o3 and R1 0528 [5] - Grok 4 supports text and image input, function calling, and structured outputs [6] Availability & Deployment - Grok 4 is tested via the xAI API, and the version deployed on X/Twitter may differ [2] - Grok 4 is expected to be available via the xAI API, the Grok chatbot on X, and potentially via Microsoft Azure AI Foundry [4] Speed - Grok 4's speed is 75 output tokens/s, slower than o3 (188 tokens/s), Gemini 2.5 Pro (142 tokens/s), Claude 4 Sonnet Thinking (85 tokens/s) but faster than Claude 4 Opus Thinking (66 tokens/s) [5]
X @Elon Musk
Elon Muskยท2025-07-10 06:28