实测DeepSeek V3.1：不止拓展上下文长度

Core Viewpoint - The article discusses the differences between DeepSeek V3.1 and its predecessor V3, highlighting improvements in programming capabilities, creative writing, translation quality, and response tone. Group 1: Model Comparison - DeepSeek V3.1 has extended its context length to 128K tokens, compared to V3's 65K tokens, allowing for more comprehensive responses [10] - The new version shows significant enhancements in various tasks, including programming, creative writing, translation, and knowledge application [3][4] Group 2: Programming Capability - In a programming test, V3.1 provided a more comprehensive solution for compressing GIF files, considering more factors and providing detailed usage instructions [12][13][14] - The performance of V3.1 was notably faster in executing the task compared to V3 [18] Group 3: Creative Writing - For a creative writing task based on a high school essay prompt, V3.1 produced a more poetic and emotional response, contrasting with V3's more straightforward style [22] Group 4: Translation Quality - In translating a scientific abstract, V3.1 demonstrated a better understanding of complex sentences, although it missed translating a simple word, indicating room for improvement [30] Group 5: Knowledge Application - Both versions provided answers to a niche question about a specific fruit type, with V3.1 showing some inconsistencies in terminology and relevance [31][37] Group 6: Performance Metrics - V3.1 achieved a score of 71.6% on the Aider benchmark, outperforming Claude Opus 4 while being significantly cheaper [43] - On the SVGBench, V3.1 was noted as the best variant among its peers, although it still did not surpass the best open models [44] Group 7: User Feedback - Users have reported various observations regarding the new features and performance of V3.1, including improvements in physical understanding and the introduction of new tokens [45][47]