Workflow
R1 0528
icon
Search documents
马斯克发布“全球最强AI模型”Grok 4,称这是人工智能第一次能够解决真实世界中难以解决的复杂工程问题
Sou Hu Cai Jing· 2025-07-10 11:42
Core Insights - Musk announced the release of Grok 4, claiming it is the first AI capable of solving complex engineering problems that cannot be found in the internet or books [4] Group 1: Product Features - Grok 4 is a reasoning model that supports both text and image inputs, function calls, and structured outputs [2] - It has a context window of 256K tokens, which is lower than Gemini 2.5 Pro's 1M tokens but higher than Claude 4 Sonnet and Opus (200K tokens) and R1 0528 (128K tokens) [2] - The pricing for Grok 4 is similar to Grok 3, at $3/15 per million input/output tokens, with cache input tokens priced at $0.75 per million [2] Group 2: Performance Metrics - Grok 4 outputs 75 tokens per second, which is slower than o3 (188 tokens/s), Gemini 2.5 Pro (142 tokens/s), and Claude 4 Sonnet Thinking (85 tokens/s), but faster than Claude 4 Opus Thinking (66 tokens/s) [3] - It ranks first in various benchmarks such as Humanity's Last Exam, MMLU-Pro, AIME 2024, AIME 25, and GPQA, outperforming OpenAI's o3 and Google's Gemini 2.5 Pro [3] Group 3: Future Developments - xAI announced upcoming products, including an AI programming model set to launch in August, a multimodal agent in September, and a video generation model in October [5]
X @Elon Musk
Elon Musk· 2025-07-10 06:28
Performance Benchmarks - Grok 4 achieves an Artificial Analysis Intelligence Index of 73, surpassing OpenAI o3 at 70, Google Gemini 2.5 Pro at 70, Anthropic Claude 4 Opus at 64, and DeepSeek R1 0528 at 68 [1] - Grok 4 leads in Coding Index (LiveCodeBench & SciCode) and Math Index (AIME24 & MATH-500) [5] - Grok 4 achieves an all-time high score of 88% in GPQA Diamond, exceeding Gemini 2.5 Pro's previous record of 84% [5] - Grok 4 achieves an all-time high score of 24% in Humanity's Last Exam, beating Gemini 2.5 Pro's previous all-time high score of 21% [5] - Grok 4 achieves joint highest score for MMLU-Pro and AIME 2024 of 87% and 94% respectively [5] Model Specifications & Pricing - Grok 4's pricing is $3/$15 per 1 million input/output tokens ($0.75 per 1 million cached input tokens), equivalent to Claude 4 Sonnet but more expensive than Gemini 2.5 Pro and o3 [4] - Grok 4 has a 256 thousand token context window, less than Gemini 2.5 Pro's 1 million tokens but more than Claude 4 Sonnet, Claude 4 Opus, o3 and R1 0528 [5] - Grok 4 supports text and image input, function calling, and structured outputs [6] Availability & Deployment - Grok 4 is tested via the xAI API, and the version deployed on X/Twitter may differ [2] - Grok 4 is expected to be available via the xAI API, the Grok chatbot on X, and potentially via Microsoft Azure AI Foundry [4] Speed - Grok 4's speed is 75 output tokens/s, slower than o3 (188 tokens/s), Gemini 2.5 Pro (142 tokens/s), Claude 4 Sonnet Thinking (85 tokens/s) but faster than Claude 4 Opus Thinking (66 tokens/s) [5]