Workflow
谷歌旗下最强图像模型来了,P图师要消失了?
Di Yi Cai Jing·2025-08-27 11:20

Core Viewpoint - Google has launched its latest image generation and editing model, Gemini 2.5 Flash Image, which has quickly become a top performer in various image generation rankings, showcasing its capabilities in the image editing and generation market [1][4]. Group 1: Model Performance - Gemini 2.5 Flash Image has been recognized for its excellent performance in character consistency, prompt adherence, physical logic realism, and aesthetic quality [4][18]. - The model achieved a score of 1362 in the image editing category, leading the second-place model by 171 points [5]. - In the text-to-image category, it ranked first with a score of 1147, surpassing competitors like OpenAI's GPT-4o and Alibaba's Qwen-Image-Edit [6][13]. Group 2: Cost Efficiency - The cost of generating a single image with Gemini 2.5 Flash Image is approximately $0.039 (around 0.28 RMB), significantly lower than OpenAI's $0.19 per image [17][39]. - The pricing structure is set at $30 for 1 million output tokens, with each image requiring about 1290 tokens [17]. Group 3: Limitations - The model does not support Chinese input, leading to a decline in performance when generating content related to Chinese language [4][18]. - During testing, the model occasionally produced structural errors, such as multiple limbs in generated images [4][18]. Group 4: Commercial Applications - Gemini 2.5 Flash Image is expected to significantly impact the commercial landscape, particularly in e-commerce, advertising, and design, by enabling quick and cost-effective image generation [39]. - The model can replace certain manual editing tasks, potentially redefining the roles of photo editors and visual designers [39]. Group 5: Technical Capabilities - The model excels in maintaining character consistency across different poses, lighting, and environments, and can blend multiple images into one while preserving details [13][20]. - It can accurately generate images with clear and readable text, making it suitable for logos, charts, and posters [18][39]. - The model demonstrates strong performance in physical knowledge, accurately predicting visual outcomes based on given scenarios [35].