xAI发布Grok 4.1：速度、质量与情感智能全方位升级，幻觉率大幅降低

Core Insights - xAI has officially released Grok 4.1, which is now available to all users on grok.com, X platform, and iOS/Android applications, including free users, with Auto mode enabled by default [2] - Elon Musk stated that users will experience significant improvements in speed and quality, focusing on faster responses, higher factual accuracy, and a more natural, personalized conversational experience [3] Performance Enhancements - Grok 4.1 has shown remarkable performance improvements, with the hallucination rate decreasing from 12.09% to 4.22%, nearly a threefold reduction, and the FactScore improving from 9.89 to 2.97, indicating a significant upgrade in factual stability [4] - The performance enhancements are attributed to a new reinforcement learning infrastructure and reward model system, allowing the model to self-evaluate and iterate quickly without heavy reliance on large-scale human annotations [4] User Preference and Rankings - In a recent silent testing phase, Grok 4.1 achieved a blind preference rate of 64.78%, significantly higher than its predecessor [6] - On the LMSYS Arena, Grok 4.1's Thinking mode scored 1483 Elo, ranking first among all public models, while its non-inference mode reached 1465 Elo, ranking second, showcasing a leap in performance compared to the previous version, which ranked 33rd [7][8] Emotional Intelligence and Creative Writing - Grok 4.1 excelled in the EQ-Bench emotional intelligence test with a score of 1586 Elo, an increase of over 100 points from the previous version [10] - In the Creative Writing v3 assessment, Grok 4.1 achieved a score of 1722 Elo, nearly 600 points higher than its predecessor, reflecting improvements in narrative structure, language rhythm, and character voice stability [12] Contextual Understanding - Grok 4.1 has significantly expanded its context window, supporting up to 256,000 tokens, and even up to 2 million in Fast mode, enhancing its ability to handle complex inputs and maintain coherent interactions [12] - The model's emotional understanding has improved, allowing it to engage more deeply with users' feelings, providing responses that reflect genuine empathy rather than generic comfort [15] Narrative Style - Grok 4.1 demonstrates a more human-like narrative style, particularly in creative writing, showcasing a sense of self-awareness and emotional depth in its responses [16][17]