马斯克抢先谷歌一步放大招,Grok 4.1登顶LMArena,创意写作直逼GPT-5.1
Sou Hu Cai Jing·2025-11-18 09:53

Core Insights - xAI has launched its latest large language model, Grok 4.1, which shows significant improvements in response speed, accuracy, and emotional understanding compared to its predecessor [1][6][26] - Grok 4.1 has two versions: the standard Grok 4.1 and Grok 4.1 Thinking, the latter being an enhanced reasoning variant [1][6] - Grok 4.1 has achieved top rankings on the LMArena leaderboard, outperforming competitors like Google's Gemini 2.5 Pro [2][7] Performance Metrics - Grok 4.1 Thinking leads the LMArena leaderboard with an Elo score of 1483, while Grok 4.1 follows closely with 1465, surpassing Gemini 2.5 Pro by 31 points [2][3][7] - The hallucination rate of Grok 4.1 has decreased from 12.09% to 4.22%, marking a nearly threefold improvement [9][26] - Grok 4.1's FActScore improved from 9.89 to 2.97, indicating a significant enhancement in factual accuracy [11] Emotional and Creative Capabilities - Grok 4.1 scored 1586 Elo on the EQ-Bench, reflecting a substantial increase in emotional intelligence compared to its predecessor [12][18] - In creative writing assessments, Grok 4.1 achieved a score of 1722 Elo, nearly 600 points higher than the previous version, showcasing its advanced narrative and stylistic capabilities [17][20] User Experience and Interaction - Grok 4.1 has been designed to provide a more natural and engaging user experience, with improved understanding of user intent and a more stable personality [22][26] - During a silent release phase, Grok 4.1 was preferred by users 64.78% of the time in blind comparisons against previous models [23][24] Training and Development - The model incorporates a large-scale reinforcement learning system, allowing for self-evaluation and rapid iteration during training [8][26] - Grok 4.1's context window has been expanded to 256K tokens, enhancing its ability to handle long documents and complex tasks [22][26]