线性注意力机制（Lightning Attention） - filings, earnings calls, financial reports, news

线性注意力机制（Lightning Attention）

Search documents

Jing Ji Guan Cha Wang· 2025-06-18 11:32

Core Viewpoint - MiniMax has launched its self-developed MiniMax M1 model, which competes directly with DeepSeek R1 and Google's Gemini 2.5 Pro in terms of key technical specifications, architecture design, context processing capabilities, and training costs [1][2]. Group 1: Model Specifications - MiniMax M1 supports a context length of 1 million tokens, which is 8 times larger than DeepSeek R1's 128,000 tokens and only slightly behind Google's Gemini 2.5 Pro [1]. - The total parameter count for MiniMax M1 is 456 billion, with 45.9 billion parameters activated per token, while DeepSeek R1 has a total of 671 billion parameters but activates only 37 billion per token [1]. Group 2: Cost Efficiency - MiniMax M1 consumes only 25% of the floating-point operations compared to DeepSeek R1 when generating 100,000 tokens, and requires less than half the computational power for inference tasks of 64,000 tokens [2]. - The training cost for MiniMax M1 was only $535,000, significantly lower than the initial expectations and much less than the $5-6 million GPU cost for training DeepSeek R1 [2]. Group 3: Pricing Strategy - MiniMax M1 has a tiered pricing model for its API services based on the number of input or output tokens, with the first tier charging 0.8 yuan per million input tokens and 8 yuan per million output tokens, which is lower than DeepSeek R1's pricing [3]. - The pricing for the first two tiers of MiniMax M1 is lower than that of DeepSeek R1, and the third tier for long text is currently not covered by DeepSeek [3]. Group 4: Technology Innovations - MiniMax M1's capabilities are supported by two core technologies: the linear attention mechanism (Lightning Attention) and the reinforcement learning algorithm CISPO, which enhances efficiency and stability in training [2].

线性注意力机制（Lightning Attention）

线性注意力机制（Lightning Attention）

人工智能

MiniMax M1模型

200亿AI独角兽反击，MiniMax首款推理模型对标DeepSeeK，算力成本仅53万美元

Hua Er Jie Jian Wen· 2025-06-17 11:57

Core Insights - MiniMax, a Chinese AI startup valued at 20 billion RMB, has launched its first inference model, M1, which challenges leading models like DeepSeek and others with significantly lower training costs and superior efficiency [1][6]. Performance and Efficiency - M1 outperforms domestic closed-source models and approaches the performance of the best overseas models, surpassing DeepSeek, Alibaba, ByteDance, OpenAI, Google, and Anthropic in certain tasks [1]. - In terms of efficiency, M1 consumes less than 50% of the computational power of DeepSeek R1 when generating 64K tokens, and only 25% for 100K tokens [7]. - The model has a total of 456 billion parameters and supports context inputs of up to 1 million tokens, which is eight times that of DeepSeek R1 [3]. Cost Efficiency - The entire training process for M1 utilized 512 NVIDIA H800 GPUs over three weeks, with a rental cost of approximately 537,400 USD (around 3.8 million RMB), which is an order of magnitude lower than initially expected [6]. - MiniMax has developed a new reinforcement learning algorithm named CISPO, which achieved double the speed of ByteDance's recent DAPO algorithm, requiring only 50% of the training steps to reach similar performance [6]. Market Positioning - MiniMax has adopted a tiered pricing strategy for its API, making M1 more cost-effective compared to DeepSeek R1, especially in the input length ranges of 0-32K and 32K-128K tokens [8]. - M1 is positioned as a "price killer" in the market, receiving positive feedback from developers for its cost-performance ratio [8]. Future Developments - M1 is just the first product in a series of releases planned by MiniMax, which aims to introduce intelligent agent applications and further updates in video and music model capabilities [9]. - The company believes that M1's efficient architecture will provide unique advantages in future intelligent agent applications that require extensive reasoning and integration of long-context information [9].

Artificial Intelligence

混合专家（MoE）架构

线性注意力机制（Lightning Attention）

Artificial Intelligence

MiniMax-M1

DeepSeek R1

Artificial Intelligence

混合专家（MoE）架构

线性注意力机制（Lightning Attention）

Artificial Intelligence

MiniMax-M1

DeepSeek R1