马斯克Grok-4碾压所有大模型！“比所有领域博士都聪明”，AIME25拿满分

Core Viewpoint - The release of Grok-4 marks a significant advancement in AI capabilities, achieving over 50% accuracy in various tests, surpassing previous models and demonstrating superior intelligence compared to human performance [1][6][4]. Group 1: Performance Metrics - Grok-4 Heavy achieved a score of 44.4%, an increase of nearly 18 percentage points compared to Gemini-2.5-Pro [2]. - With training and tool integration during testing, Grok-4 can reach a score of 50.7% [3]. - In various assessments, Grok-4 scored 88.9% on GPQA, 100% on AIME25, 79.4% on LCB, 96.7% on HMMT25, and 61.9% on USAMO25 [11]. Group 2: Training and Development - Grok-4's training volume is 100 times that of Grok-2 and 10 times that of Grok-3, utilizing a 200,000-card computing cluster [23]. - The model emphasizes the integration of tools during post-training, which enhances performance and efficiency [26][27]. - The incorporation of tools allows Grok-4 to flexibly complete complex tasks, improving its overall intelligence [30]. Group 3: Demonstrations and Applications - Grok-4 demonstrated strong reasoning abilities by predicting MLB World Series win probabilities, assigning a 21.6% chance to the Dodgers [31]. - It showcased visual understanding by simulating gravitational wave collisions and generating realistic waveforms [35]. - In programming tests, Grok-4 nearly achieved full marks and is expected to release a specialized fast and intelligent programming model [37]. Group 4: Future Plans and Integration - Future developments include a programming model, multi-modal agents, and video generation models [46]. - Grok is expected to be integrated into Tesla's latest firmware, enhancing the interaction between drivers and vehicles [58]. - The Grok voice assistant will also be featured in the Optimus humanoid robot, serving as its brain [60].