Qwen拿半成品刷下AIME'25满分，给别人留点面子吧……

Core Insights - Qwen3 has achieved a remarkable performance in mathematical reasoning tests, scoring full marks in AIME 25 and HMMT 25, showcasing its advanced capabilities in problem-solving [1][3][6]. Performance Comparison - The previous best scores in AIME 25 were held by the GPT-5 series, with GPT-5 Codex (high) at 98.7% accuracy and GPT-5 (high) at 94.3%. In contrast, Qwen3 scored 91% [6]. Model Features - Qwen3-Max-Thinking is currently available for free testing in Qwen Chat, with an API launched on Alibaba Cloud. The official team has committed to ongoing training and updates for the model [9][10]. Testing and Results - Initial tests included programming tasks, such as simulating a bouncing ball within a rotating hexagon, which Qwen3-Max-Thinking executed successfully [12][15]. - The model also tackled complex mathematical problems, providing correct answers after a brief thinking period [16]. User Experience - Users reported that Qwen3-Max-Thinking took considerable time to process certain tasks, sometimes reflecting on the problem in both Chinese and English [25]. - The model demonstrated the ability to create a 3D solar system using Three.js, although initial attempts were incomplete until prompted for improvements [20][22]. Future Developments - The development team acknowledges the need for further refinement and enhancement of the model's capabilities, indicating that the work is ongoing [27].