Workflow
1.93bit版DeepSeek-R1编程超过Claude 4 Sonnet,不用GPU也能运行
量子位·2025-06-10 04:05

Core Viewpoint - The article discusses the performance and advancements of the DeepSeek-R1 (0528) model, highlighting its programming capabilities and efficiency improvements compared to previous versions and competitors. Group 1: Model Performance - The latest version R1-0528 achieved a score of 71.4 on the Aider programming leaderboard, surpassing Claude 4 Opus and the previous R1 version [5][2] - R1-0528 shows significant improvements in gaming performance, particularly in Tetris, where it outperformed o4-mini and ranked just below o3 [21][24][28] - The model's performance in Candy Crush was also notable, scoring 548 points, which is nearly 20 points higher than o4-mini [32] Group 2: Model Optimization and Size - The 1.93bit version of R1 has a file size reduced by over 70% compared to the original 8bit version, making it more lightweight and efficient [3][9] - Unsloth has developed multiple quantized versions of R1, with the smallest being 1.66bit at 162GB, which is nearly 80% smaller than the 8bit version [9][10] - The team recommends using the 2.4bit and 2.7bit versions for a better balance between size and performance [14] Group 3: Team and Other Models - Unsloth's team focuses on fine-tuning models for better efficiency, having worked on various models including Qwen, Phi, Mistral, and Llama, achieving at least a 50% reduction in memory usage and a 50% increase in speed [16][17] - Unsloth has also introduced a distilled Qwen3-8B model based on R1-0528, claiming it can match the performance of Qwen3-235B and is adaptable to various configurations [19]