Workflow
海报设计大模型
icon
Search documents
南大联合LibLib.ai、中科院自动化所,共同提出布局推理与精准编辑「海报设计大模型」PosterCopilot
机器之心· 2025-12-10 08:13
Core Viewpoint - The article discusses the development of PosterCopilot, a professional-level poster design and editing model that addresses significant challenges in graphic design automation, particularly in layout reasoning and controllable editing [2][6][40]. Industry Pain Points - Graphic design faces substantial challenges in achieving true automation, with existing models like Stable Diffusion struggling with layered structures, leading to material distortion and lack of fine control [6]. - Current multimodal models exhibit four critical shortcomings: severe element overlap, lack of visual feedback, regression to a single ground truth, and inability to perform layer-specific edits [8][10]. Core Achievements - PosterCopilot aims to bridge the gap between single-step generation and professional workflows through a systematic solution that incorporates a three-stage training strategy [13][14]. - The innovative three-stage training includes: 1. Perturbation Supervised Fine-Tuning (PSFT) to address geometric distortions [15]. 2. Visual-Reality Alignment Reinforcement Learning (RL-VRA) to correct overlaps and proportional issues [15]. 3. Aesthetic Feedback Reinforcement Learning (RLAF) to encourage exploration beyond ground truth layouts [15]. Generative Agent - PosterCopilot functions as a comprehensive design assistant, facilitating seamless transitions from abstract design concepts to concrete materials through a reception model and T2I model [16][17]. - The model supports various professional scenarios, including full poster generation from provided assets, intelligent completion of missing materials, global theme transitions, intelligent size reconstruction, and multi-round fine-grained editing [21][23][28][29][31]. Experimental Results - PosterCopilot outperforms existing commercial competitors and state-of-the-art models across multiple metrics, achieving an average win rate exceeding 74% in human evaluations [34][35]. - In assessments of layout rationality, text legibility, and element preservation, PosterCopilot demonstrates superior performance compared to models like Microsoft Designer and CreatiPoster [35][37]. Conclusion and Outlook - By decoupling layout reasoning from generative editing and incorporating reinforcement learning to align with human aesthetics, PosterCopilot sets a new benchmark for intelligent design tools and offers a new paradigm for AI-assisted creative workflows [40].