Seek .-MiniMax开源首个推理模型，456B参数，性能超DeepSeek-R1，技术报告公开

Core Insights - MiniMax has launched the world's first open-source large-scale hybrid architecture inference model, MiniMax-M1, with a five-day continuous update plan [2] Model Specifications - The M1 model has a parameter scale of 456 billion, activating 45.9 billion parameters per token, supporting 1 million context inputs and the longest 80,000 token inference output in the industry, which is 8 times that of DeepSeek-R1 [4] - Two versions of the MiniMax-M1 model were trained with thinking budgets of 40k and 80k [4] Training and Cost - The training utilized 512 H800 units over three weeks, costing approximately $537,400 (around 3.859 million RMB), which is an order of magnitude lower than initial cost expectations [7] - The M1 model is available for unlimited free use on the MiniMax app and web [7] API Pricing Structure - The API pricing for M1 is tiered based on input length: - 0-32k input: 0.8 RMB/million tokens input, 8 RMB/million tokens output - 32k-128k input: 1.2 RMB/million tokens input, 16 RMB/million tokens output - 128k-1M input: 2.4 RMB/million tokens input, 24 RMB/million tokens output [7][11] - Compared to DeepSeek-R1, M1's first tier input price is 80% and output price is 50% of DeepSeek-R1's, while the second tier input price is 1.2 times higher [9] Performance Evaluation - MiniMax-M1 outperforms other models like DeepSeek-R1 and Qwen3-235B in complex software engineering, tool usage, and long context tasks [13][14] - In the MRCR test, M1's performance is slightly lower than Gemini 2.5 Pro but better than other models [13] - In the SWE-bench Verified test set, M1-40k and M1-80k perform slightly worse than DeepSeek-R1-0528 but better than other open-source models [14] Technical Innovations - M1 employs a mixed expert (MoE) architecture and a lightning attention mechanism, allowing efficient scaling for long input and complex tasks [16] - The model utilizes large-scale reinforcement learning (RL) for training, with a new CISPO algorithm that enhances performance by optimizing importance sampling weights [16][17] Future Directions - MiniMax emphasizes the need for "Language-Rich Mediator" agents to handle complex scenarios requiring dynamic resource allocation and multi-round reasoning [19]