Workflow
Meta开源MobileLLM-R1模型,不到1B参数,用1/10的训练就超越了Qwen3

Core Viewpoint - Meta AI has officially released the MobileLLM-R1 series, which includes efficient sub-billion parameter language models optimized for on-device use cases, demonstrating significant performance improvements compared to existing open-source models [4][8]. Group 1: Model Performance and Features - The MobileLLM-R1 series includes three base models: MobileLLM-R1-140M, MobileLLM-R1-360M, and MobileLLM-R1-950M, which are not general chat models but are supervised fine-tuned (SFT) for specific tasks such as mathematics, programming (Python, C++), and scientific questions [6][8]. - The largest model, MobileLLM-R1-950M, was pre-trained using approximately 2 trillion high-quality tokens, achieving performance comparable to models trained on 36 trillion tokens, such as Qwen3 0.6B [8]. - MobileLLM-R1-950M outperforms existing models in various benchmarks, achieving five times higher accuracy on the MATH benchmark compared to the Olmo 1.24B model and twice as high as the SmolLM2 1.7B model [10]. Group 2: Model Architecture and Efficiency - The architecture of the MobileLLM-R1 models includes varying layers and parameters, with MobileLLM-R1-950M having 22 layers and 949 million parameters, while the smaller models have 15 layers and 140 million to 360 million parameters [14]. - The models are designed for text input and output, with a context length of 4k for base models and 32k for final models, supporting a vocabulary size of 128k [15]. Group 3: Research and Development Team - The development of the MobileLLM-R1 series was led by a team of researchers, including Zechun Liu, Ernie Chang, and Changsheng Zhao, who have extensive backgrounds in natural language processing and model optimization [18][21][30]. - The project took a year to develop, focusing on efficient deployment and optimization of large language models for resource-constrained environments [18][22].