SGLang Diffusion
Search documents
SGLang Diffusion震撼发布:图像视频生成速度猛提57%!
机器之心· 2025-11-21 10:17
Core Insights - SGLang has officially announced support for Diffusion models, enhancing its high-performance scheduling and kernel optimization capabilities from large language models to image and video diffusion models, achieving up to 57% speed improvement compared to previous frameworks [2][3][7]. Group 1: Model Support and Performance - SGLang Diffusion supports mainstream open-source video and image generation models, including Wan series, Hunyuan, Qwen-Image, and Flux [2]. - The performance acceleration achieved is up to 57% across various workloads [3]. - The architecture is designed to handle both language tasks and diffusion tasks, aiming to be a high-performance multimodal foundation for future generative AI [9]. Group 2: Implementation and Features - SGLang Diffusion employs a ComposedPipelineBase strategy, allowing the diffusion inference process to be broken down into reusable stages, enhancing flexibility and performance [11]. - The system integrates advanced parallel technologies to optimize performance, leveraging the existing sgl-kernel for future enhancements like quantization [12]. - Multiple familiar interface options are provided, including OpenAI-compatible API, CLI, and Python API, facilitating easy integration into existing workflows [14]. Group 3: Performance Benchmarking - SGLang Diffusion has demonstrated significant performance improvements compared to open-source baselines like Huggingface Diffusers on H100 GPUs, showcasing advantages across various models and parallel configurations [28][29]. - The performance benchmarks indicate shorter inference times, which correlate with higher performance [31]. Group 4: Community and Future Plans - The SGLang Diffusion team is focused on continuous innovation, aiming to replicate or exceed the performance advantages seen in LLM scenarios within diffusion inference [34]. - Future enhancements include support for long video generation models, integration of quantization kernels, and improved cloud storage capabilities for generated files [36].