攻克大模型训推差异难题，蚂蚁开源新一代推理模型Ring-flash-2.0

Core Viewpoint - The article discusses the release of Ring-flash-2.0 by Ant Group's Bailing team, highlighting its potential to reshape the competitive landscape of large models by achieving high performance with lower activation parameters and improved training stability [1][4][26]. Performance Overview - Ring-flash-2.0 features a total of 100 billion parameters and 6.1 billion activations, achieving a score of 86.98 in mathematical AIME and an Elo score of 90.23 on CodeForces, with a throughput of over 200 tokens per second [1][21]. - The model's performance is comparable to state-of-the-art (SOTA) levels of 40 billion dense models, demonstrating significant advancements in reasoning tasks [1][21]. Technical Innovations - The introduction of the icepop algorithm allows for stable long-term reinforcement learning (RL) training by freezing tokens with large discrepancies in training and inference accuracy, preventing gradient backpropagation [6][10][13]. - The two-staged RL approach combines supervised fine-tuning (SFT) with reinforcement learning using verifiable rewards (RLVR) and human feedback (RLHF), optimizing the training process [14][16]. Cost Efficiency - Ring-flash-2.0 achieves a performance equivalent to a 40 billion dense model while only activating 6.1 billion parameters, marking a turning point in cost efficiency within the large model competition [17][21]. - The model's design allows for high sparsity and low activation, significantly reducing inference costs in high-concurrency scenarios [21]. Market Implications - The competitive landscape for large models is shifting from a focus on parameter quantity to cost-effectiveness, with Ring-flash-2.0 positioned as a leading solution in this new era [18][25]. - The article suggests that Ring-flash-2.0 may signify the beginning of a "high cost-performance era" in the field of large models, following the advancements initiated by GPT-4 [26].