小米集团 -小米大语言模型 MiMo-V2-Flash 发布
MiMo-V2-Flash has 309B total parameters and 15B active parameters, adopting hybrid attention architecture that interleaves sliding-window and full attention using an aggressive 128-token sliding window and a 5:1 hybrid ratio. As illustrated in the exhibit below, MiMo-V2-Flash is viewed as competitive vs. mainstream LLM such as DeepSeek-V3.2. In addition, MiMo-V2-Flash is engineered for maximum efficiency. It delivers fast inference at 150 tokens per second while maintaining a low cost of $0.1 per million in ...