Model Performance & Benchmarks - Grok 4 demonstrates a significant leap in performance compared to previous models due to reinforcement learning with verifiable rewards [1][2][3][4] - On the "Humanity's Last Exam" benchmark, Grok 4 achieved 26.9% without tools, 41% with tool usage, and 50.7% with scaled test-time compute, surpassing other frontier models [9][10][11] - Grok 4 Heavy achieved a perfect 100% score on the AMY 2025 benchmark, which consists of some of the hardest math questions [29] - Grok 4 significantly outperformed other models on the ARC AGI benchmark, achieving 66.6% on V1 and 15.9% on V2, indicating "nonzero levels of fluid intelligence" [33][34][35] - In a real-world vending machine management test ("Vending Bench"), Grok 4 achieved a net worth of $4,700, significantly higher than other models and humans [36] Model Architecture & Features - Grok 4 utilizes multiple agents that work together, share knowledge, and select the best solution, particularly in the "Heavy" version [12][13][20] - Grok 4 incorporates tool usage, including web browsing, sophisticated memory, and code execution environments [10] - Grok 4 has a 256k context window, multimodal reasoning capabilities, real-time data search, and enterprise-grade security [43] Real-World Applications & Demonstrations - Grok 4 was used to predict the winner of the World Series by browsing odds sites and calculating its own odds, giving the Dodgers a 21.6% chance of winning [22][23] - Grok 4 generated a visualization of two black holes colliding, demonstrating its ability to create content with some simplifications [24][25][26][27] - Grok 4 was used to create a timeline of announcements and score releases for the "Humanity's Last Exam" [27] - Grok 4 was used to create a first-person shooting game in four hours, highlighting its ability to automate asset sourcing and accelerate game development [38][39][40] Future Developments & Availability - A coding-specific model is expected in August, a multimodal agent in September, and a video generation model in October [46] - Super Grok is priced at $30 per month, while Super Grok Heavy is priced at $300 per month or $3,000 per year [44]
Grok 4 is really smart... Like REALLY SMART
Matthew Bermanยท2025-07-10 22:31