Workflow
多模态统一模型
icon
Search documents
昆仑万维推出并开源Skywork UniPic
Zheng Quan Ri Bao Wang· 2025-07-30 07:14
在追求模型能力极限的同时,Skywork UniPic也坚持效率重要性的设计理念。Skywork UniPic以1.5B的 紧凑参数规模,在无CoT(思维链)的情况下取得了SOTA("当前最佳水平")分数,逼近部分较大模 型带CoT的0.88分;在DPG-Bench复杂指令生图基准上达到85.5分的行业SOTA水平。 据悉,Skywork UniPic在单一模型中深度融合图像理解、文本生成图像(T2I)与图像编辑三大核心任 务,构建了真正统一的多模态模型架构。 传统多模态统一模型多依赖VQ或VAE编码器来压缩视觉内容,虽然具备一定效果,但也存在局限性。 它们更侧重保留图像的视觉细节而非语义信息,这会在一定程度上削弱模型的图像理解能力。 为此,Skywork UniPic团队借鉴Harmon架构设计,并在表征方式上做出关键调整。采用MAR编码器作 为图像生成路径的视觉表征基础,同时引入SigLIP2作为图像理解路径的主干。 此外,Skywork UniPic完成端到端优化流程,能够实现生成、理解、编辑三大能力的协同训练和相互促 进,突破传统方法中能力权衡的技术瓶颈。这一架构设计不仅保持了自回归模型的简洁高效,更 ...
1.5B参数撬动“吉卜力级”全能体验,国产开源之光多模态统一模型,来了
量子位· 2025-07-30 04:48
Core Viewpoint - The article discusses the emergence of the Skywork UniPic model, which integrates multi-modal capabilities in AI, showcasing its performance and potential impact on the industry [1][2][4]. Group 1: Model Features and Performance - Skywork UniPic is a 1.5 billion parameter model that achieves performance comparable to larger models, demonstrating high "performance density" and can run smoothly on consumer-grade graphics cards [10][12]. - The model excels in various tasks, including image understanding, text-to-image generation, and image editing, with notable scores in GenEval and DPG-Bench benchmarks [25][26][27]. - Skywork UniPic utilizes an autoregressive model architecture, allowing for deep integration of image generation within a multi-modal framework, distinguishing it from mainstream diffusion models [30][33]. Group 2: Data and Training Strategies - The model's training is based on a refined dataset approach, utilizing high-quality image-text pairs for pre-training, which enhances its semantic representation capabilities [37][42]. - A progressive multi-task training strategy is employed, focusing on one task at a time to ensure stability and performance across understanding, generation, and editing tasks [53][60]. - The team implemented specialized reward models to ensure high-quality training data, significantly improving the model's performance in both image generation and editing tasks [48][50]. Group 3: Industry Implications and Trends - The rise of native multi-modal unified models like Skywork UniPic indicates a shift in the AI landscape, emphasizing efficiency and user experience over sheer scale [61][63]. - The open-source approach taken by companies like Kunlun Wanwei is fostering innovation and accessibility in AI technology, allowing broader participation in AI development [65][68]. - The article highlights the potential for a creative explosion in AI applications, driven by user-friendly tools that lower the barriers to entry for utilizing AI [69].