Qwen“半成品”推理模型刷下AIME满分，俘获大批国外开发者！实测碾压GPT-5 Thinking、还能写侦探小说

Core Insights - The article highlights the advancements of Alibaba's Qwen3-Max-Thinking model, which has achieved a 100% accuracy rate in challenging international math competitions, indicating a significant leap in AI reasoning capabilities [2][5][7]. Model Overview - Qwen3-Max-Preview is Alibaba's largest and most powerful language model to date, with over 1 trillion parameters and 36 trillion tokens of pre-training data [7]. - The model supports a context window of 262,144 tokens and has a maximum input of 258,048 tokens and output of 32,768 tokens [7]. - It has been noted for its speed, reportedly faster than ChatGPT, and its ability to avoid common pitfalls of large language models [8][9]. Performance and Testing - Initial tests showed that Qwen3-Max-Preview outperformed several state-of-the-art models in various benchmarks, including SuperGPQA and AIME25 [7]. - Users have reported that the model performs exceptionally well in simple tasks and has shown promising results in reasoning tasks, even outperforming GPT-5 in some instances [16][21]. - However, some developers have cautioned that the model is still in preview and may require further optimization for programming tasks [18][21]. Pricing and Accessibility - Qwen3-Max-Preview is not open-source and requires developers to access it through a paid API, with a tiered pricing model based on input token size [12][13]. - The pricing structure includes rates such as $0.861 per million input tokens for the 0-32K range and $2.151 for the 128K-252K range [13]. Future Developments - The model is designed for complex reasoning, code writing, and handling structured data formats, making it suitable for enterprise and research applications [12]. - Continuous updates and improvements are expected as the model undergoes further training [4].