实测DeepSeek V3.1，不止拓展上下文长度

Core Insights - The article discusses the differences between DeepSeek V3.1 and its predecessor V3, highlighting improvements in programming performance, creative writing, translation quality, and response tone [2][6][40]. Group 1: Model Features - DeepSeek V3.1 has expanded context length to 128K tokens, while V3 has a maximum of 65K tokens [8][7]. - The new version supports multiple tensor formats, enhancing its usability across different platforms [1][6]. - The API for V3 still operates with a maximum context length of 65K tokens, indicating a significant upgrade in V3.1 [7][8]. Group 2: Performance Comparison - In programming tasks, V3.1 demonstrated a more comprehensive approach, providing detailed code and usage instructions compared to V3 [12][13]. - For creative writing, V3.1 produced a more poetic and emotional response, contrasting with V3's straightforward style [20][18]. - Both versions successfully solved a mathematical problem, but their presentation styles differed, with V3.1 offering a clearer explanation [23][24]. Group 3: Translation Capabilities - V3.1 showed improved understanding of complex sentences in translation tasks, although it missed translating some simple words [29][26]. - The translation of a biology paper's abstract revealed V3.1's enhanced capability in handling specialized terminology compared to V3 [28][27]. Group 4: Knowledge and Reasoning - In a knowledge-based query about a specific fruit type, both versions identified it as a drupe, but V3.1's reasoning strayed off-topic [30][36]. - V3.1 achieved a score of 71.6% on the Aider benchmark, outperforming V3 and indicating its competitive edge in non-reasoning tasks [42][40]. Group 5: User Feedback and Market Response - The release of V3.1 has generated significant interest, becoming a trending topic on social media platforms [40][41]. - Users have noted improvements in physical understanding and the introduction of new tokens, although some issues related to the online API have been reported [45][49].