Core Insights - xAI has announced the release of Grok 4.1, which is now available to all users across various platforms including the Grok website, X, and mobile applications [1][3] - Grok 4.1 shows significant improvements in real-world usability, particularly in creativity, emotional interaction, and collaborative engagement [4][6] - The model has enhanced capabilities in understanding subtle intentions and maintaining coherent personality traits while retaining the intelligence and reliability of its predecessor [4][6] Performance Metrics - Grok 4.1 has achieved a 64.78% probability of being preferred by users in comparative evaluations against previous models [6] - In the LMArena Text Arena leaderboard, Grok 4.1's reasoning mode (quasarflux) ranks first with an Elo score of 1483, outperforming the highest non-xAI model by 31 points [13] - The non-reasoning mode (tensor) ranks second with an Elo score of 1465, demonstrating superior performance even without reasoning capabilities [13][14] Emotional Intelligence - Grok 4.1 was tested on the EQ-Bench3, which evaluates emotional intelligence through challenging role-play scenarios [17] - The results indicate that Grok 4.1's reasoning and non-reasoning modes ranked first and second respectively in emotional intelligence assessments [18] Creative Writing - xAI evaluated Grok 4.1's performance on the Creative Writing v3 benchmark, which involved generating responses to 32 different writing prompts [23] - The model has shown a significant reduction in hallucination rates for factual queries during its post-training phase, indicating improved reliability in information retrieval [27] Technical Details - For more technical details regarding Grok 4.1, a model card is available at the provided link [29]
刚刚,马斯克Grok 4.1低调发布!通用能力碾压其他一切模型
机器之心·2025-11-17 23:40