Workflow
布局控制生成
icon
Search documents
NIPS2025|小红书智创AIGC团队提出布局控制生成新算法InstanceAssemble
机器之心· 2025-11-03 08:45
Core Insights - The article discusses advancements in text-to-image diffusion models, particularly focusing on the challenges and innovations in layout-controlled image generation [2][3][4]. Challenges in Existing Methods - Current layout-to-image generation methods struggle with precise alignment and high image quality in complex scenes, requiring support for multi-modal conditions, which adds to the technical complexity [2][3]. - Existing methods either lack training, leading to significant performance drops in complex layouts, or require additional modules that introduce a large number of parameters and high training costs [2][3]. InstanceAssemble Framework - The InstanceAssemble framework was proposed by Xiaohongshu's AIGC team to address the challenges of robust and efficient layout-controlled image generation [4]. - It employs a cascading structure that processes global text prompts and instance-level layout conditions in stages, ensuring both global quality and local alignment [9]. - The framework includes an independent attention mechanism that effectively handles overlapping or small objects in complex layouts while maintaining overall image coherence [10]. Model Adaptation and Multi-modal Support - InstanceAssemble utilizes LoRA modules for lightweight model adaptation, adding only about 3% to the base model's parameters, allowing for flexible layout control without extensive retraining [10][18]. - The method supports multi-modal layout inputs, enabling instance specifications through text descriptions or additional image information [11]. Evaluation and Performance - A new benchmark dataset, DenseLayout, was created to evaluate the model's performance in high-density layout scenarios, containing 5,000 images and approximately 90,000 instances [14]. - The Layout Grounding Score (LGS) was introduced as a new evaluation metric, combining spatial accuracy and semantic consistency to measure how well generated images meet layout instructions [14]. - InstanceAssemble demonstrated superior performance on the DenseLayout benchmark, achieving high layout alignment metrics and maintaining good global image quality, especially in dense layouts [16][21]. Application Potential - The design of InstanceAssemble emphasizes performance while ensuring compatibility and extensibility, allowing for the integration of various style transfer capabilities through LoRA modules [20]. - The framework shows potential for applications in intelligent layout design, virtual content creation, and data augmentation, contributing to the advancement of layout image generation [21].