Workflow
多实例图像生成
icon
Search documents
对标GPT-4o和香蕉,浙大开源ContextGen:布局身份协同新SOTA
3 6 Ke· 2025-12-22 08:12
【导读】浙江大学ReLER团队开源ContextGen框架,攻克多实例图像生成中布局与身份协同控制难题。基于Diffusion Transformer架构,通过双重注意 力机制,实现布局精准锚定与身份高保真隔离,在基准测试中超越开源SOTA模型,对标GPT-4o等闭源系统,为定制化AI图像生成带来新突破。 在定制化AI图像生成领域,多实例图像生成(MIG)面临一个关键的协同控制挑战:精确布局控制和多主体身份保真的同步实现。 现有方法往往只能达成二者之一,少数能兼顾的方法在性能上也存在显著不足。 为解决这一布局与身份的协同控制瓶颈,浙江大学ReLER团队提出了ContextGen框架,首次在Diffusion Transformer (DiT) 架构内部,通过双重上下文注意 力机制实现了架构级的分层解耦控制。 ContextGen在基准测试上,身份保持能力超越SOTA开源模型,并成功对标了GPT-4o和Nano-Banana等强大的闭源系统,实现了在复杂定制化控制方面实现 了关键突破。 论文地址:https://arxiv.org/abs/2510.11000 代码地址:https://github.com/n ...
布局控制+身份一致:浙大提出ContextGen,实现布局锚定多实例生成新SOTA
机器之心· 2025-12-20 04:45
Core Insights - The article discusses the advancements in image generation, particularly focusing on the challenges in Multi-Instance Image Generation (MIG), which include layout control and identity preservation [2][5]. Group 1: ContextGen Framework - ContextGen is introduced as a new framework based on Diffusion Transformer (DiT) aimed at addressing the challenges of layout control and identity preservation in MIG tasks [5][6]. - The framework employs a dual-core mechanism that operates on a unified context token sequence, enhancing both layout and identity fidelity [8][10]. Group 2: Mechanisms of ContextGen - The Contextual Layout Anchoring (CLA) mechanism focuses on global context guidance, utilizing user-designed or model-generated layout images to ensure precise global layout control and initial identity information [10]. - The Identity Consistency Injection (ICA) mechanism injects identity information from high-fidelity reference images into corresponding target locations, ensuring consistency across multiple instances [12]. Group 3: Data Foundation - The IMIG-100K dataset is introduced as the first large-scale, detailed annotated dataset designed for image-guided multi-instance generation tasks, providing various difficulty levels and detailed layout and identity annotations [14]. Group 4: Performance Optimization - ContextGen incorporates a reinforcement learning phase based on preference optimization (DPO) to encourage creativity and diversity in generated images, moving beyond rigid replication of layout content [17]. Group 5: Experimental Validation - ContextGen demonstrates superior performance in quantitative and qualitative evaluations, surpassing all open-source models and matching closed-source commercial models in identity consistency [21][25]. - In the LAMICBench++ benchmark, ContextGen achieved an average score improvement of +1.3% over existing open-source models, showcasing its capabilities in complex multi-instance scenarios [21]. Group 6: User Interaction - A user-friendly front-end interface is included in the project, allowing users to upload reference images, add new materials via text, and design layouts through drag-and-drop functionality [32]. Group 7: Future Outlook - The ReLER team plans to further optimize the model architecture and explore diverse user interaction methods to meet broader application needs, emphasizing the importance of understanding user intent and multimodal references [36].
不靠死记布局也能按图生成,多实例生成的布局控制终于“可控且不串脸”了丨浙大团队
量子位· 2025-12-19 07:20
浙江大学ReLER团队 投稿 量子位 | 公众号 QbitAI 尽管扩散模型在单图像生成上已经日渐成熟,但 当任务升级为高度定制化的多实例图像生成 (Multi-Instance Image Generation, MIG) 时 ,挑战随之显现: 如何在实现空间布局控制的同时,保持多主体身份与参考图像高度一致? 现有方法在面对需要宏观的布局控制和微观的身份注入的复杂任务时 常常陷入两难 。 能显式控制布局的方法,往往无法利用参考图像来对实例进行定制。 而能以参考图像为指导的方法,则难以实现对布局的精确控制,且在实例数量增加时面临着严重的身份信息丢失问题。 为解决这一制约自定义图像生成的技术瓶颈, 浙江大学ReLER团队发布基于DiT的新框架ContextGen 。 该框架通过分层解耦上下文,解决布局控制与身份保真度的难题,并在多项关键指标上取得了SOTA突破。 机制创新:布局与身份的协同控制 ContextGen的核心在于提出了双重上下文注意力机制,将复杂的全局控制和局部注入任务,并在DiT的不同层级进行部署。 Contextual Layout Anchoring (CLA):宏观布局锚定 CLA机制将包含 ...