Workflow
图像编辑
icon
Search documents
凌晨战神Qwen又搞事情!新模型让图像编辑“哪里不对改哪里”
量子位· 2025-08-19 07:21
Core Viewpoint - Qwen-Image-Edit is a powerful image editing tool that allows users to perform precise edits, including adding, removing, and modifying elements in images, while maintaining visual semantics and supporting various creative functionalities [2][67]. Group 1: Features and Capabilities - Qwen-Image-Edit offers a range of functionalities such as original IP editing, perspective switching, and virtual character generation, showcasing its versatility in image manipulation [2][20][67]. - The tool supports semantic editing, allowing modifications to images while preserving their original visual semantics, which is crucial for maintaining character integrity in IP creation [7][10]. - Users can perform perspective transformations, including 90-degree and 180-degree rotations, demonstrating the tool's capability to handle complex visual adjustments [14][19]. Group 2: Performance and Testing - Initial tests indicate that Qwen-Image-Edit produces impressive results, with accurate representations of elements and details, such as maintaining the correct number of fingers in character designs [13][19]. - The tool effectively adds elements to images, such as signs, while also managing reflections and maintaining detail, although high-resolution images may lead to some loss of quality [29][34]. - The AI's ability to remove and recolor elements within images has been validated through practical examples, showcasing its precision in editing tasks [39][42][45]. Group 3: Advanced Editing Techniques - Qwen-Image-Edit introduces a chain editing feature, allowing users to make incremental corrections to images without needing to regenerate the entire picture, enhancing efficiency in the editing process [56][62]. - The tool's dual editing capabilities encompass both low-level visual appearance edits and high-level semantic edits, catering to a wide range of image editing needs [67]. Group 4: Market Position and Performance Metrics - Qwen-Image-Edit has demonstrated state-of-the-art (SOTA) performance in various public benchmark tests, establishing itself as a robust foundational model for image editing tasks [67].
Qwen新开源,把AI生图里的文字SOTA拉爆了
量子位· 2025-08-05 01:40
Core Viewpoint - The article discusses the release of Qwen-Image, a 20 billion parameter image generation model that excels in complex text rendering and image editing capabilities [3][28]. Group 1: Model Features - Qwen-Image is the first foundational image generation model in the Tongyi Qianwen series, utilizing the MMDiT architecture [4][3]. - It demonstrates exceptional performance in complex text rendering, supporting multi-line layouts and fine-grained detail presentation in both English and Chinese [28][32]. - The model also possesses consistent image editing capabilities, allowing for style transfer, modifications, detail enhancement, text editing, and pose adjustments [27][28]. Group 2: Performance Evaluation - Qwen-Image has achieved state-of-the-art (SOTA) performance across various public benchmark tests, including GenEval, DPG, OneIG-Bench for image generation, and GEdit, ImgEdit, GSO for image editing [29][30]. - In particular, it has shown significant superiority in Chinese text rendering compared to existing advanced models [33]. Group 3: Training Strategy - The model employs a progressive training strategy that transitions from non-text to text rendering, gradually moving from simple to complex text inputs, which enhances its native text rendering capabilities [34]. Group 4: Practical Applications - The article includes practical demonstrations of Qwen-Image's capabilities, such as generating illustrations, PPTs, and promotional images, showcasing its ability to accurately integrate text with visuals [11][21][24].
图像界的DeepSeek!12B参数对标GPT-4o,5秒出图,消费级硬件就能玩转编辑生成
量子位· 2025-06-30 00:38
Core Viewpoint - Black Forest Labs has announced the open-source release of its flagship image model FLUX.1 Kontext[dev], designed for image editing and capable of running on consumer-grade chips [1][23]. Group 1: Model Features - FLUX.1 Kontext[dev] has 12 billion parameters, offering faster inference and performance comparable to closed-source models like GPT-image-1 [2][36]. - The model allows for direct changes to existing images based on editing instructions, enabling precise local and global edits without any fine-tuning [6][36]. - Users can optimize images through multiple consecutive edits while minimizing visual drift [6][36]. - The model is optimized for NVIDIA Blackwell architecture, enhancing performance [6][39]. Group 2: Performance and Efficiency - FLUX.1 Kontext[dev] has been validated against a benchmark called KontextBench, which includes 1,026 image-prompt pairs across various editing tasks, showing superior performance compared to existing models [37]. - The model's inference speed has improved by 4 to 5 times compared to previous versions, typically completing tasks within 5 seconds on NVIDIA H100 GPUs, with operational costs around $0.0067 per run [41]. - Users have reported longer iteration times on MacBook Pro chips, taking about 1 minute per iteration [41]. Group 3: User Engagement and Accessibility - The official API for FLUX.1 Kontext[dev] is open for public testing, allowing users to upload images and experiment with the model [19]. - The model's open weights and variants are available, enabling users to adjust speed, efficiency, and quality based on their hardware capabilities [41].
字节开源图像编辑黑科技!1/30参数1/13数据,性能提升9.19%
量子位· 2025-05-07 09:33
Core Viewpoint - ByteDance has developed a new image editing method that improves performance by 9.19% compared to the current state-of-the-art (SOTA) methods, utilizing only 1/30 of the training data and 1/13 of the model parameter size [1]. Group 1: Methodology and Innovation - The new method does not require additional pre-training tasks or architectural modifications, relying instead on powerful multimodal models like GPT-4o to correct editing instructions [2]. - This approach addresses the issue of noisy supervisory signals in existing image editing models by constructing more effective editing instructions to enhance editing outcomes [3][9]. - The data and model have been made open-source on GitHub [4]. Group 2: Challenges in AI Image Editing - AI models often misinterpret instructions, such as changing the color of a boy's tie, which can lead to unintended alterations in skin tone or clothing [6]. - The team identified that existing image editing datasets contain a significant amount of noisy supervisory signals due to the automated methods used for dataset construction, leading to mismatches between instructions and image pairs [10][11][12]. Group 3: Training and Supervision - SuperEdit focuses on improving the quality of supervisory signals rather than merely increasing parameter size or pre-training computational power [13]. - The team utilized GPT-4o to generate more accurate editing instructions by observing differences between original and edited images [17]. - A comparative supervision mechanism was established to ensure the model learns the subtle differences between correct and incorrect editing instructions, enhancing its ability to understand and execute commands [22][23]. Group 4: Performance Metrics - SuperEdit demonstrated outstanding performance in multiple benchmark tests, achieving an overall accuracy of 69.7% and a score of 3.91 in the Real-Edit benchmark, surpassing the previous SOTA method SmartEdit, which had an accuracy of 58.3% and a score of 3.59 [25][28]. - The model was trained using a triplet loss function to distinguish between correct and incorrect editing instructions [27]. Group 5: Future Directions - The team plans to expand this data-prioritized approach to more visual generation tasks and explore possibilities of combining it with larger models [31].
美图公司AI视觉领域竞争力升级:七项图像编辑成果出炉
Zheng Quan Ri Bao· 2025-04-09 08:40
Core Insights - Meitu's MT Lab has achieved significant recognition with five research outcomes selected for the prestigious CVPR 2025 conference, which received over 13,000 submissions and has a low acceptance rate of 22.1% [2] - The lab also had two projects accepted at the AAAI 2025 conference, which had an acceptance rate of 23.4% from 12,957 submissions [2] - The seven research outcomes focus on image editing, including three generative AI technologies, three segmentation technologies, and one 3D reconstruction technology [2] Generative AI Technologies - GlyphMastero has been implemented in Meitu's app Meitu Xiuxiu, providing users with a seamless text modification experience [3] - MTADiffusion is integrated into Meitu's AI material generator WHEE, allowing for efficient image editing with simple commands [3] - StyO is utilized in Meitu Xiuxiu's AI creative and beauty camera features, enabling users to explore different dimensions easily [4] Segmentation and 3D Reconstruction Technologies - The segmentation breakthroughs include interactive segmentation and cutout technologies, which are applied in e-commerce design, image editing, and portrait beautification [4] - EVPGS represents advancements in 3D reconstruction, with increasing demand in new perspective generation, augmented reality (AR), 3D content generation, and virtual digital humans [4] Industry Position and Future Potential - Meitu's long-term investment in AI capabilities has allowed the company to integrate cutting-edge technologies into practical applications, enhancing its competitive edge in the core visual field [4] - The continuous iteration of product capabilities has led to increased user engagement and willingness to pay, indicating promising growth potential and expansion opportunities for the company [4]