Core Insights - Ant Group has officially open-sourced the industry's first high-performance diffusion language model inference framework, dInfer, which significantly enhances the efficiency of diffusion language models [1][2] Performance Metrics - dInfer achieves a 10.7 times improvement in inference speed compared to NVIDIA's Fast-dLLM framework, with average transactions per second (TPS) increasing from 63.6 to 681 [1] - In the HumanEval code generation task, dInfer reaches a speed of 1011 tokens per second in single-batch inference, surpassing autoregressive models for the first time in the open-source community [1] - When compared to the vLLM framework running the Qwen2.5-3B model, dInfer's average inference speed is 2.5 times faster, with 681 TPS versus 277 TPS [1] Industry Impact - The launch of dInfer marks a critical step in transitioning diffusion language models from theoretical feasibility to practical efficiency, connecting cutting-edge research with industrial application [2] - Ant Group invites global developers and researchers to explore the vast potential of diffusion language models, aiming to build a more efficient and open AI ecosystem [2]
首次超越自回归模型!蚂蚁集团开源业内首个高性能扩散语言模型推理框架dInfer