Kunlun-昆仑万维推出并开源Skywork UniPic

Core Insights - Kunlun Wanwei Technology Co., Ltd. has launched and open-sourced the Skywork UniPic model, which integrates image understanding, text-to-image generation, and image editing capabilities into a single framework [1][2] - The model is based on large-scale high-quality data for end-to-end pre-training, demonstrating strong generalization and transferability [1] Group 1: Model Architecture - Skywork UniPic features a unified multimodal model architecture that deeply integrates three core tasks: image understanding, text-to-image generation, and image editing [1] - Traditional multimodal models often rely on VQ or VAE encoders, which focus more on visual details than semantic information, potentially weakening image understanding capabilities [1] - The Skywork UniPic team has made key adjustments in representation methods, utilizing the MAR encoder for visual representation in the image generation path and introducing SigLIP2 as the backbone for the image understanding path [1] Group 2: Performance and Efficiency - The model completes an end-to-end optimization process, enabling collaborative training and mutual enhancement of the three core capabilities, overcoming technical bottlenecks in traditional methods [2] - Skywork UniPic maintains a compact parameter size of 1.5 billion, achieving state-of-the-art (SOTA) scores without the use of Chain of Thought (CoT), nearing the performance of larger models that utilize CoT [2] - The model has reached an industry SOTA score of 85.5 on the DPG-Bench complex instruction generation benchmark [2]