Workflow
MetaQuery
icon
Search documents
开源版MetaQuery来了!OpenUni用1.1B参数媲美BLIP3-o-8B,数据代码完全开源
机器之心· 2025-06-22 04:26
Core Viewpoint - OpenUni, developed by Nanyang Technological University S-Lab and SenseTime, is an open-source version of MetaQuery that achieves the performance of an 8B model with only 1.1B parameters, providing all code, weights, and data as open-source resources [1][18]. Architecture and Design - The architecture of OpenUni is simplified, featuring only 6 layers of connectors compared to 24 layers in MetaQuery, significantly reducing complexity [5]. - OpenUni utilizes 256 learnable queries to extract condition information from user instructions, a frozen InternVL for maintaining understanding capabilities, 6 transformer connectors based on ViT architecture, and a SANA diffusion model for efficient image generation [5][6]. Performance Metrics - OpenUni-B achieves a GenEval score of 0.84, comparable to the BLIP3-o-8B model, while OpenUni-L reaches a score of 0.86, marking it as the best-performing open-source unified model [15][18]. - In DPG-Bench, OpenUni-L-1024 scores 83.08, surpassing all MetaQuery and BLIP3-o variants [15]. Training Strategy - The training process consists of two phases: pre-training with 23 million image-text pairs and fine-tuning with 60,000 image-text pairs [7][9]. - During pre-training, the diffusion model is frozen, while in the fine-tuning phase, it becomes trainable to enhance generation quality [8][9]. Open Source Contribution - OpenUni provides a complete open-source resource, including model weights, training code, and a dataset of 23 million entries, facilitating community research and innovation [19][20]. - The project aims to offer a clear, reproducible, and extensible baseline implementation for the research community [18].