Workflow
刚刚,LMArena最新模型榜单出炉!DeepSeek-R1网页编程能力赶超了Claude Opus 4
机器之心·2025-06-17 00:10

Core Viewpoint - DeepSeek has made significant advancements in the open-source model space with the release of its upgraded R1 inference model (0528), which shows competitive performance against proprietary models [2][4][10]. Performance Summary - The R1-0528 model has improved benchmark performance, enhancing front-end functionality, reducing hallucinations, and supporting JSON output and function calls [3]. - In the latest performance rankings from LMArena, DeepSeek-R1 (0528) achieved an overall ranking of 6th, and it is the top-ranked open model [5][4]. - Specific rankings in various categories include: - 4th in Hard Prompt testing - 2nd in Coding testing - 5th in Math testing - 6th in Creative Writing testing - 9th in Instruction Following testing - 8th in Longer Query testing - 7th in Multi-Turn testing [6][7]. Competitive Landscape - In the WebDev Arena platform, DeepSeek-R1 (0528) is tied for first place with other proprietary models like Gemini-2.5-Pro-Preview-06-05 and Claude Opus 4, surpassing Claude Opus 4 in score [8]. - The performance of DeepSeek-R1 (0528) is seen as a milestone, particularly in the AI programming domain, where it competes closely with established models like Claude [10]. User Engagement - The strong performance of DeepSeek-R1 (0528) has generated increased interest and usage among users, prompting discussions about user experiences [9][11].