罗福莉执掌小米大模型首秀！定调下一代模型，全新MiMo-V2开源还横扫Agent第一梯队

Core Viewpoint - Xiaomi has introduced its new large model MiMo-V2-Flash, which emphasizes efficiency and practical deployment over sheer size, marking a significant step in its AI exploration journey [4][9]. Group 1: Model Overview - MiMo-V2-Flash features a total parameter scale of 309 billion, with only about 15 billion parameters activated during inference, utilizing a MoE (Mixture of Experts) architecture [8]. - The model incorporates Multi-Token Prediction (MTP) technology, designed for high-speed inference and agent workflows, aiming for efficiency rather than just increasing parameter size [8][21]. - Xiaomi's approach to MiMo-V2-Flash is driven by the need for models that are not only intelligent but also practical and deployable in real-world scenarios [21][22]. Group 2: Performance Metrics - During inference, the model achieves a throughput of 5000 to 15000 tokens per second in a single-machine environment, with a single request output speed of 150 tokens per second, representing a speed increase of approximately 2-3 times compared to models without MTP [24][47]. - MiMo-V2-Flash has entered the first tier in seven mainstream evaluations, particularly excelling in the SWE-Bench test with a 71.7% accuracy rate [27][28]. Group 3: Future Directions - The next generation of intelligent agents must be capable of continuous interaction with the real environment, moving beyond mere language processing to a unified, dynamic world model [30][32]. - Xiaomi emphasizes that true intelligence arises from interaction rather than just textual understanding, indicating a shift towards models that can engage with the physical world [52][53].