腾讯混元AI Infra核心技术开源:推理吞吐提升30%

Core Insights - Tencent's Mix Yuan AI Infra team has officially launched an open-source production-grade high-performance LLM inference core operator library called HPC-Ops [1] Performance Improvements - Based on HPC-Ops, the inference quality per minute (QPM) for the Mix Yuan model has improved by 30%, while the QPM for the DeepSeek model has increased by 17% [1] Operator Performance Enhancements - HPC-Ops has achieved significant performance improvements in single operator performance, with Attention showing up to a 2.22 times enhancement compared to FlashInfer/FlashAttention [1] - GroupGEMM has demonstrated a maximum improvement of 1.88 times over DeepGEMM [1] - FusedMoE has outperformed TensorRT-LLM by up to 1.49 times [1]

腾讯混元AI Infra核心技术开源:推理吞吐提升30% - Reportify