新“SOTA”推理模型避战Qwen和R1？欧版OpenAI被喷麻了

Core Viewpoint - Mistral AI has launched its first inference model, Magistral, which claims to compete with other leading models but faces skepticism due to lack of direct comparisons with the latest versions of competitors like Qwen and DeepSeek R1 0528 [1][22]. Model Performance - Magistral shows a 50% accuracy improvement on the AIME-24 benchmark compared to its earlier model, Mistral Medium 3 [3]. - In the AIME-24 benchmark, the accuracy for English is 73.6%, while other languages like French and Spanish show lower accuracy rates of 68.5% and 69.3% respectively [8]. Model Versions - Two versions of Magistral have been released: - Magistral Small, which has 24 billion parameters and is open-source under the Apache 2.0 license [4]. - Magistral Medium, a more powerful version aimed at enterprises, available on Amazon SageMaker [5]. Multilingual Support - Magistral is designed for transparent reasoning and supports multilingual inference, addressing the issue where mainstream models perform poorly in European languages compared to local languages [7]. Enhanced Features - Unlike general models, Magistral has been fine-tuned for multi-step logic, improving interpretability and providing a traceable thought process in user language [10]. - The token throughput of Magistral Medium is reported to be 10 times faster than most competitors, enabling large-scale real-time inference and user feedback [14][15]. Training Methodology - Magistral is the first large model trained purely through reinforcement learning (RL) using an improved Group Relative Policy Optimization (GRPO) algorithm [16]. - The model achieves a significant accuracy leap from 26.8% to 73.6% on the AIME-24 benchmark by eliminating KL divergence penalties and dynamically adjusting exploration thresholds [18]. Training Architecture - The model employs an asynchronous distributed training architecture, allowing for efficient large-scale RL training without relying on pre-trained distilled data [20]. - The performance of the 24 billion parameter Magistral Small model reached an accuracy of 70.7% on the AIME-24 benchmark [21]. Competitive Landscape - Comparisons made by users indicate that Qwen 4B is similar in performance to Magistral, while a smaller 30B MoE model outperforms it, and the latest R1 model shows even better results [24].