Qwen开源版Banana来了！原生支持ControlNet

Core Viewpoint - Qwen has launched a new image editing model, Qwen-Image-Edit-2509, which enhances multi-image fusion capabilities and consistency in single images, providing various creative options for users. Group 1: Image Editing Features - The new model supports multi-image input, allowing combinations such as "person + person," "person + object," and "person + scene" [1][6][2] - It can generate wedding photos by merging two images, offering both traditional and modern styles [7][12] - The model excels in creating realistic scenes, adjusting characters' expressions and poses to fit the context [16][20] - It allows for easy editing of personal photos, including changing poses and outfits, and can create various styles like American elite fashion [25][27][29] - The model can also restore old photos, including colorization and damage repair [36][40] - Enhanced text consistency features include editing font types, colors, and materials, as well as targeted text corrections [50][55] Group 2: ControlNet and Keypoint Features - The model integrates ControlNet, enabling users to modify character poses and outfits using keypoint images [4][20] - It supports depth map control to maintain consistency between objects and scenes [60] Group 3: Qwen3-Omni Model - Qwen has also released the Qwen3-omni model, which is an end-to-end multimodal model capable of processing text, audio, images, and video [4][67] - It has achieved state-of-the-art performance in 36 audio and video benchmark tests, surpassing several closed-source models [69] - The model supports real-time translation and can summarize web content in various languages [71] - It features low latency for audio and video conversations, with response times of 211ms and 507ms respectively [72] - The model can handle long audio inputs of up to 30 minutes and allows for personalized system prompts [73][74]