Workflow
马斯克再出AI王牌:Grok 4.1霸榜LMArena排行榜
Sou Hu Cai Jing·2025-11-18 01:00

Core Insights - Elon Musk's AI company xAI has launched its latest large language model, Grok 4.1, which is now available to all users on grok.com and mobile applications [1][2] Performance and Capabilities - Grok 4.1 aims to enhance usability in real-world scenarios, showing significant improvements in creativity, emotional understanding, and collaborative interaction [2] - The model achieved top-tier performance in the LMArena text capability leaderboard, with its deep-thinking version (quasarflux) scoring 1483 Elo, leading the second-place model by 31 points [4] - The "instant response" version of Grok 4.1 scored 1465 Elo, outperforming all other models in "full reasoning" mode, marking a substantial leap from the previous Grok 4, which ranked 33rd [4] Emotional and Creative Intelligence - Grok 4.1 demonstrated significant advancements in "soft skills," excelling in the EQ-Bench3 benchmark for emotional intelligence and the Creative Writing v3 test for creative capabilities [5][6] - In the EQ-Bench3 test, Grok 4.1's reasoning and non-reasoning modes secured the top two positions, while in creative writing, it ranked second and third, just behind the earlier GPT-5.1 model [6] Reliability and Accuracy - The model has improved its ability to handle complex logical reasoning and better understand emotional prompts, enhancing its human-like interaction [9] - A key improvement is the significant reduction in the model's "hallucination" rate, which refers to factual inaccuracies, particularly in fast-response models equipped with search tools [10] - The hallucination rate has been notably decreased, providing users with more reliable and accurate information [10]