Core Viewpoint - The release of Grok 4 by xAI represents a significant leap in AI capabilities, showcasing unprecedented performance in various benchmark tests and redefining the boundaries of AI intelligence [4][19]. Group 1: Benchmark Performance - Grok 4 achieved remarkable scores in the "Humanity's Last Exam" (HLE), with a text-only score of 26.9% and a score of 41.0% when using tools [6][9]. - In the "Heavy" mode, Grok 4 scored an impressive 58.3% in HLE, far surpassing competitors like Claude 4 Opus and OpenAI's o3, which scored between 15%-25% [9][12]. - Grok 4 also set new records in other benchmarks, including 15.9% in ARC-AGI-2 and a top score of 73 in the Artificial Analysis index, outperforming all other models [15][16]. Group 2: Key Innovations - The success of Grok 4 is attributed to three main pillars: a new collaborative model, a philosophy of truth-seeking, and substantial computational power [20]. - The "Multi-Agent Study Group" approach allows Grok 4 Heavy to tackle complex problems by generating multiple independent agents that collaborate to find the best solution [21]. - The training of Grok 4 utilized over 200,000 H100 GPUs, doubling the resources from Grok 3 and increasing training volume by 100 times compared to Grok 2 [24][26]. Group 3: Real-World Applications - Grok 4 demonstrated its capabilities through various applications, including generating realistic animations of black hole collisions and developing a first-person shooter game in just four hours [27][29]. - In a business simulation, Grok 4 achieved a net asset value twice that of its nearest competitor, showcasing its strategic planning and execution abilities [31]. - The AI is also being used in biomedical research to automate the analysis of complex experimental data, significantly reducing the time required for hypothesis generation [35]. Group 4: Future Plans and Pricing - xAI announced the "SuperGrok" subscription plan, with pricing set at $300 per year for standard access and $3,000 for exclusive features [37][41]. - The company is actively working on enhancing Grok 4's multimodal capabilities, with a new version expected to be completed soon [39]. - Future developments include the potential for AI-generated television shows and video games, indicating a shift towards more creative applications of AI technology [42][43].
马斯克发布“地球最强AI模型”Grok 4:横扫所有榜单,在“人类最终测试”超越人类博士”!
AI科技大本营·2025-07-10 07:14