刚刚,马斯克Grok 4.1低调发布,通用能力碾压其他一切模型
3 6 Ke·2025-11-18 00:11

Core Insights - xAI has announced the release of Grok 4.1, which is now available to all users across various platforms including the Grok website, X, and mobile applications [1][2] - Grok 4.1 features significant improvements in real-world usability, particularly in creativity, emotional interaction, and collaborative engagement [4][6] Performance Enhancements - Grok 4.1 has been optimized on the same large-scale reinforcement learning infrastructure as its predecessor, enhancing its style, personality, helpfulness, and alignment [6] - In comparative evaluations, Grok 4.1 has a 64.78% probability of being preferred by users over previous models [6] Benchmark Achievements - Grok 4.1 has set a new benchmark in human preference evaluations, achieving an Elo score of 1483 in the LMArena Text Arena leaderboard, leading the nearest non-xAI model by 31 points [9] - The non-reasoning mode of Grok 4.1 achieved a score of 1465, ranking second overall, demonstrating superior performance even without reasoning capabilities [9] Emotional Intelligence - Grok 4.1 was tested on the EQ-Bench3, which evaluates emotional intelligence through challenging role-play scenarios, and it ranked first in both reasoning and non-reasoning modes [11] - The model's responses to emotional prompts have shown a marked improvement in empathy and engagement compared to its predecessor [14] Creative Writing Capabilities - In the Creative Writing v3 benchmark, Grok 4.1's reasoning and non-reasoning modes ranked second and third, respectively, showcasing its advanced creative writing abilities [15] - The model's responses to creative prompts reflect a significant enhancement in creativity and coherence [18] Reduction of Hallucinations - xAI has focused on reducing factual hallucinations in Grok 4.1, particularly in information retrieval tasks, resulting in a notable decrease in hallucination rates during testing [19][20]