Qwen紧追OpenAI开源4B端侧大模型，AIME25得分超越Claude 4 Opus

Core Insights - The Qwen team has released two new models, Qwen3-4B-Instruct-2507 and Qwen3-4B-Thinking-2507, which are designed to enhance performance on various tasks, particularly in reasoning and general capabilities [2][3][5]. Model Performance - Qwen3-4B-Thinking-2507 achieved a score of 81.3 in the AIME25 assessment, outperforming competitors like Gemini 2.5 Pro and Claude 4 Opus [4][5][23]. - The new models support a context length of 256k, significantly improving context awareness and understanding [3][17]. Model Specifications - Qwen3-4B-Instruct-2507 is a non-reasoning model that enhances general capabilities and multi-language support, while Qwen3-4B-Thinking-2507 is a reasoning model tailored for expert-level tasks [7][16]. - The 4B parameter size is particularly friendly for edge devices, allowing for deployment on small hardware like Raspberry Pi [2][8][26]. Comparative Analysis - In various tests, Qwen3-4B-Instruct-2507 outperformed smaller closed-source models like GPT-4.1-nano and showed comparable performance to larger models like Qwen3-30B-A3B [13][15]. - The models exhibit significant improvements in areas such as instruction following, logical reasoning, and text generation, with enhanced alignment to user preferences [17][24]. Deployment Recommendations - The Qwen team has provided deployment suggestions for local use, including applications like Ollama and MLX-LM, and recommended using a quantized version for very small devices [27][28]. - For optimal performance, especially in reasoning tasks, it is advised to use a context length greater than 131,072 tokens [29]. Community Engagement - The Qwen team has encouraged community feedback and interaction, with links provided for accessing the new models on platforms like Hugging Face and ModelScope [26][36].