又一SOTA级开源模型!阶跃Step-3多模态推理登顶,百万token解码成本不到4毛钱
量子位·2025-08-01 00:46

Core Viewpoint - The article highlights the launch of the Step-3 model, a new state-of-the-art (SOTA) multimodal reasoning model developed by JieYueXingChen, which has achieved significant performance improvements and cost efficiency compared to existing models [2][3][18]. Group 1: Model Performance - Step-3 has set a new SOTA in open-source multimodal reasoning models, achieving a decoding speed of 4039 tokens per second on Hopper GPUs, which is 174% faster than DeepSeek-V3 [3][12]. - The model has 321 billion parameters, with 316 billion dedicated to the language model and 5 billion for the visual encoder, allowing it to handle up to 800,000 tokens [6][9]. - In various benchmarks such as MMMU, AIME25, and LiveCodeBench, Step-3 has reached top performance levels, outperforming competitors like Llama 4 Maverick and ERNIE 4.5 [10][11]. Group 2: Cost Efficiency - The decoding cost of Step-3 is only 30% of that of DeepSeek-V3 when using the H20 setup, making it significantly more cost-effective [7][16]. - The model's design allows it to operate efficiently on a heterogeneous setup, resulting in a cost reduction of nearly 12% compared to Qwen MoE [14]. - Step-3's average decoding throughput is 3910 tokens per GPU per second, with peak performance reaching 4039 tokens, which is 74% higher than DeepSeek-V3 [12][18]. Group 3: Innovative Design - Step-3 employs a model-infrastructure integrated design, optimizing attention, feedforward networks, and cluster scheduling as a unified optimization target [18][19]. - The core technology is the Multi-Matrix Factorization Attention (MFA) mechanism, which reduces the size of the key-value cache, making it more suitable for long-context scenarios [20][22]. - The Attention-FFN Disaggregation (AFD) mechanism separates attention and feedforward network tasks to optimize resource usage across different GPU groups [25][26]. Group 4: Market Position - The article notes that Chinese open-source models dominate the Hugging Face leaderboard, with eight out of the top ten models being developed in China, including Step-3 [33][39]. - The leading models include GLM-4.5 and Hunyuan World, showcasing the strength of domestic AI development [34][36].