Core Viewpoint - The FairyR1-32B model developed by Peking University demonstrates competitive performance in mathematical and coding tasks with only 5% of the parameters compared to larger models like DeepSeek-R1-671B, indicating a significant advancement in efficient large language model research [1][7]. Model Development - The FairyR1-32B model is built on the DeepSeek-R1-Distill-Qwen-32B base, utilizing techniques such as fine-tuning and model merging to achieve high performance with reduced parameters [2]. - The research focuses on optimizing the distillation data construction process, resulting in approximately 6.6k math data and 3.8k code data for training [3][4]. Experimental Results - FairyR1-32B outperformed DeepSeek-R1-671B in specific benchmarks, achieving scores of 80.4 and 75.6 in AIME 2024 and AIME 2025 respectively, while also scoring 67.7 in LiveCodeBench [6]. - The model's performance in scientific question answering (GPQA) was lower than that of DeepSeek-R1-671B, indicating potential limitations in certain areas [6][7]. Conclusion - The research team asserts that the FairyR1-32B model represents a significant step towards achieving high-performance models with limited resources through improved distillation and merging methods [7].
5%参数比肩DeepSeek满血R1!北大“小”模型靠分合蒸馏,打破推理成本下限
量子位·2025-05-27 01:07