Core Viewpoint - Byte's UXO team has developed and open-sourced a unified framework called USO, which addresses the multi-indicator consistency problem in image generation, enabling simultaneous style transfer and subject retention across various tasks [1][19]. Group 1: Model Capabilities - USO can effectively manage subject, character, or style retention using a single model and just one reference image [7]. - The framework allows for diverse applications, such as generating cartoon characters in different scenarios, like driving a car or reading in a café, while maintaining high image quality comparable to commercial models [8][10][12][14]. - USO has been evaluated using a newly designed USO-Bench, which assesses performance across subject-driven, style-driven, and mixed generation tasks, outperforming several contemporary models [17][19]. Group 2: Performance Metrics - In the performance comparison, USO achieved a subject-driven generation score of 0.623 and a style-driven generation score of 0.557, placing it at the top among various models [18]. - User studies indicated that USO received high ratings across all evaluation dimensions, particularly in subject consistency, style consistency, and image quality [19]. Group 3: Innovative Techniques - USO employs a "cross-task self-decoupling" paradigm, enhancing the model's learning capabilities by allowing it to learn features relevant to different task types [21]. - The architecture is based on the open-source model FLUX.1 dev, incorporating style alignment training and content-style decoupling training [22]. - The introduction of a Style Reward Learning (SRL) algorithm, designed for Flow Matching, further promotes the decoupling of content and style through a mathematically mapped reward function [24][25]. Group 4: Data Framework - The team has created a cross-task data synthesis framework, innovatively constructing triplet data that includes both layout-changing and layout-preserving elements [30].
字节开源图像生成“六边形战士”,一个模型搞定人物/主体/风格保持
量子位·2025-09-04 04:41