Workflow
图像生成模型
icon
Search documents
阿里Z-Image登顶开源图像生成模型榜单:1秒生图 千图仅需5美元
Xin Lang Cai Jing· 2025-12-23 03:33
快科技12月23日消息,今日,阿里Z-Image登顶开源图像生成模型榜单。 全球权威AI基准测试平台Artificial Analysis公布最新的图像模型榜单,阿里6B参数Z-Image Turbo超越 32B的FLUX.2,成为最强开源图像生成模型。 目前,该模型已在阿里云百炼上线,生成1千张图片仅需5美元。 Z-Image Turbo的ELO分数达到1152,也刷新了榜单纪录。 在推理层面,通过解耦式蒸馏与强化学习训练,原本20步以上的推理流程直接缩短到8步,图像生成速 度有效提升。 此外,通过提示词增强器,Z-Image Turbo能先理解复杂任务再生成图片,例如用户输入"生一道残阳铺 水中,半江瑟瑟半江红",模型能精准理解并生成匹配诗句意境的图片。 11月底,Z-Image Turbo面向全球开源,开源首日即登顶Hugging Face热榜第一位,并连续三周霸榜。 不到一个月,该模型下载量已超400万,是近期最受欢迎的图像生成模型。 业内人士认为,这是业界性能最好、生成速度最快、价格最便宜的图像生成模型之一。 | 1J Creator 1J | | Model 1↓ | ELO 11 | 95% C ...
通义千问推出全新图像生成模型Qwen-lmage-Layered
Bei Jing Shang Bao· 2025-12-22 11:26
北京商报讯(记者 陶凤 王天逸)12月22日,通义千问官方宣布,推出全新图像生成模型Qwen-lmage- Layered,新模型采用自研创新架构,可将图片"拆解"成多个图层。这种分层表示赋予了图像内在的可 编辑性:每个图层都可以独立操作,而不会影响其他内容。同时,这种分层结构天然支持高保真的基本 编辑操作,例如缩放、移动和重新着色。通过将不同元素物理地隔离到不同的图层中,我们的方法实现 了高保真的编辑效果。 ...
阿里推出全新图像生成模型Qwen-lmage-Layered
Di Yi Cai Jing· 2025-12-22 10:07
(本文来自第一财经) 阿里通义千问推出全新图像生成模型Qwen-lmage-Layered。据介绍,新模型采用自研创新架构,可将图 片"拆解"成多个图层。这种分层表示赋予了图像内在的可编辑性:每个图层都可以独立操作,而不会影 响其他内容。同时,这种分层结构天然支持高保真的基本编辑操作,例如缩放、移动和重新着色。通过 将不同元素物理地隔离到不同的图层中,实现了高保真的编辑效果。 ...
又一国产图像大模型开源,实测连续P图绝了,中文渲染是短板
3 6 Ke· 2025-12-08 10:47
智东西12月8日报道,今日,美团正式发布并开源图像生成模型LongCat-Image,这是一款在图像编辑能力上达到开源SOTA水准的6B参数模 型,重点瞄准文生图与单图编辑两大核心场景。 ▲图源:Hugging Face 从官方披露的基准测试结果来看,LongCat-Image主要对标了Seedream4.0、Qwen-Image、HunyuanImage-3.0、Nano Banana以及FLUX.1-dev等 主流开源与闭源生图模型,其核心优化集中在"编辑可控性"和"中文文字渲染"两项能力上。 而在实际体验中,它在连续改图、风格变化和材质细节上表现较好,但在复杂排版场景下,中文文字渲染仍存在不稳定的情况。在涉及复杂 UI设计、游戏界面生成等任务时,模型的审美也暴露出一定短板,这或许与其不具备联网搜索能力有关。 在体验入口方面,美团也同步提供了多种使用方式。在移动端,LongCat APP已支持文生图与图生图能力;在网页端,用户也可通过 https://longcat.ai/进入图片生成入口进行体验。 对于开发者而言,LongCat-Image的模型权重与代码也已同步开源: Hugging Face: ht ...
后生可畏,何恺明团队新成果发布,共一清华姚班大二在读
3 6 Ke· 2025-12-04 02:21
Core Insights - The article discusses the introduction of Improved MeanFlow (iMF), an enhanced version of the original MeanFlow (MF), which addresses key issues related to training stability, guidance flexibility, and architectural efficiency [1][4]. Model Performance - iMF significantly improves model performance by reformulating the training objective to a more stable instantaneous velocity loss and introducing flexible classifier-free guidance (CFG) [2][12]. - In the ImageNet 256x256 benchmark, the iMF-XL/2 model achieved a FID score of 1.72 in 1-NFE (single-step function evaluation), representing a 50% improvement over the original MF [2][18]. Model Configuration and Efficiency - The configurations of both MF and iMF models are detailed, showing a reduction in parameters and improved performance metrics for iMF models compared to MF models [3][19]. - For instance, the iMF-B/2 model has 89 million parameters and a FID score of 3.39, while the MF-B/2 model has 131 million parameters and a FID score of 6.17 [3][19]. Training Methodology - iMF's core improvement lies in reconstructing the prediction function, transforming the training process into a standard regression problem, which enhances optimization stability [4][11]. - The training loss is now based on instantaneous velocity, allowing for a more stable and standard regression training process [10][11]. Guidance Flexibility - iMF introduces a flexible classifier-free guidance mechanism, allowing the guidance scale to be learned as a condition, thus enhancing the model's adaptability during inference [12][14]. - This flexibility enables the model to learn average velocity fields under varying guidance strengths, unlocking CFG's full potential [12]. Contextual Conditioning - The iMF architecture employs an efficient in-context conditioning mechanism, replacing the large adaLN-zero module with multiple learnable tokens for various conditions, improving efficiency and reducing parameter count [15][17]. - This adjustment allows iMF to handle multiple heterogeneous conditions more effectively, leading to a significant reduction in model size and increased design flexibility [17]. Experimental Results - iMF demonstrates exceptional performance on challenging benchmarks, with the iMF-XL/2 model achieving a FID of 1.72 in 1-NFE, showcasing its superiority over many pre-trained multi-step models [18][20]. - In 2-NFE evaluations, iMF further narrows the performance gap between single-step and multi-step diffusion models, achieving a FID of 1.54 [20].
6B文生图模型,上线即登顶抱抱脸
量子位· 2025-12-01 04:26
Core Viewpoint - The article discusses the launch and performance of Alibaba's new image generation model, Z-Image, which has quickly gained popularity and recognition in the AI community due to its impressive capabilities and efficiency [1][3]. Group 1: Model Overview - Z-Image is a 6 billion parameter image generation model that has achieved significant success, including 500,000 downloads on its first day and topping two charts on Hugging Face within two days of launch [1][3]. - The model is available in three versions: Z-Image-Turbo (open-source), Z-Image-Edit (not open-source), and Z-Image-Base (not open-source) [8]. Group 2: Performance and Features - Z-Image demonstrates state-of-the-art (SOTA) performance in image quality, text rendering, and semantic understanding, comparable to contemporaneous models like FLUX.2 [3][8]. - The model excels in generating realistic images and handling complex text rendering, including mixed-language content and mathematical formulas [6][15]. - Users have reported high-quality outputs, including detailed portraits and creative visual interpretations, showcasing the model's versatility [11][14][32]. Group 3: Technical Innovations - Z-Image's speed and efficiency are attributed to its architecture optimization and model distillation techniques, which reduce computational load without sacrificing quality [34][39]. - The model employs a single-stream architecture (S3-DiT) that integrates text and image processing, streamlining the workflow and enhancing performance [35]. - The distillation process allows Z-Image to generate high-quality images with only eight function evaluations, significantly improving generation speed [40][42]. Group 4: Market Position and Future Prospects - The timing of Z-Image's release is strategic, coinciding with the launch of FLUX.2, indicating a competitive landscape in the AI image generation market [44]. - The model's open-source availability on platforms like Hugging Face and ModelScope positions it favorably for further adoption and experimentation within the AI community [45].
Nano Banana Pro一手实测:我们玩嗨了
机器之心· 2025-11-21 10:17
Core Insights - The article discusses the capabilities of the newly released AI tool, Nano Banana Pro, particularly in generating images and understanding complex prompts related to engineering structures like the Huajiang Canyon Bridge [4][12][13]. Group 1: AI Capabilities - Nano Banana Pro demonstrated exceptional control and accuracy in generating images based on detailed prompts, including the ability to incorporate specific logos and contextual information from the internet [10][12]. - The AI was tested with challenging scenarios, such as transforming a night image of the Huajiang Canyon Bridge into a daytime scene, showcasing its ability to maintain detail and realism [16][19]. - The model's performance was further evaluated by asking it to describe the bridge's structure and principles, where it successfully identified and labeled various components, although some minor inaccuracies were noted [24][27]. Group 2: Testing Challenges - The AI faced increased difficulty when tasked with generating detailed blueprints and technical illustrations of the bridge, revealing some limitations in accurately placing data markers [32][33]. - Despite some errors, Nano Banana Pro was able to provide a general understanding of the construction process, indicating its potential as an educational tool [36][33]. Group 3: User Experience - The AI's ability to understand prompts in Chinese and generate high-quality results on the first attempt was highlighted as a significant advantage for users [36][37]. - The article also included lighter content, showcasing the AI's versatility in generating fun and creative images, such as transforming characters into different settings [50][64].
Nano Banana Pro 要上天
3 6 Ke· 2025-11-21 01:55
Core Insights - Google has recently launched several AI models, including Gemini 3, Antigravity, and Nano Banana Pro, which showcases advanced capabilities beyond simple image generation, indicating a move towards reasoning and understanding [1][26]. Model Testing - The Nano Banana Pro model was tested for its ability to generate realistic video conference scenarios featuring well-known figures from the tech industry, demonstrating a high level of detail and accuracy in character representation [2][5]. - The model successfully integrated a two-dimensional anime character into a three-dimensional video conference setting, maintaining the character's original style while ensuring a coherent visual experience [5][26]. Language and Menu Generation - Nano Banana Pro was tasked with creating menus in multiple languages, including English, Chinese, Japanese, and Russian, showing proficiency in layout and design but revealing limitations in generating coherent text beyond the prompt [10][11]. - The generated Chinese menu displayed accurate headings and categories, but specific dish names were less recognizable, indicating a gap in the model's text generation capabilities [10][11]. Cultural Understanding - The model demonstrated an understanding of Chinese cultural elements, such as palmistry and acupuncture, accurately depicting relevant imagery and concepts [13][18]. - However, it made errors in specific details, such as mislabeling lines in palmistry, highlighting areas for improvement in cultural accuracy [14][26]. Mathematical Problem Solving - Nano Banana Pro was evaluated on its ability to solve algebraic and geometric problems, with results aligning with expected answers, suggesting a foundational understanding of mathematical concepts [20][24]. - The model's performance indicates a shift from being merely a graphic tool to incorporating reasoning and understanding in its outputs, as it processes prompts with a degree of contextual awareness [26][27]. Future Implications - The advancements in Nano Banana Pro's capabilities suggest a potential evolution towards a "world model," where the AI not only generates images but also comprehends relationships and structures within a scene [26][27]. - This progression raises both excitement and caution, as the model approaches a level of understanding that could redefine its applications in various fields [27].
大涨超4%!谷歌再创历史新高!图像生成模型 Nano Banana Pro上线,深度结合Gemini 3,这下生成世界了
美股IPO· 2025-11-20 16:07
Core Viewpoint - The article discusses the launch of Google's advanced image generation model, Nano Banana Pro, which builds on the capabilities of its predecessor, Gemini 3, offering enhanced control, higher resolution, and improved text generation abilities [2][6][39]. Group 1: Model Capabilities - Nano Banana Pro can generate high-resolution images at 2K and 4K, significantly improving detail, precision, and consistency in image generation [10][11]. - The model supports a wide range of aspect ratios, addressing previous limitations in controlling image proportions [11]. - Users can combine up to 14 reference images while maintaining consistency among up to 5 characters, enhancing the model's ability to create cohesive compositions [13][20]. Group 2: Creative Control - The model allows for "molecular-level" control over images, enabling users to make precise adjustments to specific areas, switch camera angles, and alter focus points [25][27]. - Users can apply cinematic color grading and modify lighting conditions seamlessly, enhancing the storytelling aspect of the generated images [27]. Group 3: Text Generation - Nano Banana Pro excels in generating clear, readable text within images, addressing a common challenge in image generation models [28]. - The model supports multilingual text generation and localization, facilitating global content sharing [35][36]. Group 4: Knowledge Integration - The integration with Gemini 3's knowledge base allows Nano Banana Pro to produce visually accurate content based on factual information [39][40]. - The model can connect to real-time web content, generating outputs based on the latest data, which is crucial for applications requiring precise information [40][41].
谷歌Nano Banana Pro上线,深度结合Gemini 3,这下生成世界了
机器之心· 2025-11-20 15:13
Core Viewpoint - Google has launched the Nano Banana Pro (Gemini 3 Pro Image), an advanced image generation model that enhances creative control, text rendering, and world knowledge, enabling users to create studio-level design works with unprecedented capabilities [3][4][6]. Group 1: Model Capabilities - Nano Banana Pro can generate high-resolution images at 2K and 4K, significantly improving detail, precision, stability, consistency, and controllability [8][9]. - The model supports a wide range of aspect ratios, addressing previous limitations in controlling image proportions [9][11]. - Users can combine up to 14 reference images while maintaining consistency among up to 5 characters, enhancing the model's ability to create visually coherent compositions [12][13][23]. Group 2: Creative Control - The model allows for "molecular-level" control over images, enabling users to select and reshape any part of an image for precise adjustments [25][26]. - Users can switch camera angles, generate different perspectives, and apply cinematic color grading, providing a high degree of narrative control [32][26]. Group 3: Text Generation - Nano Banana Pro features strong text generation capabilities, producing clear, readable, and multilingual text that integrates seamlessly with images [34][40]. - The model can translate text into different languages while maintaining high-quality detail and font style [41]. Group 4: Knowledge Integration - The model leverages Gemini 3's advanced reasoning to produce visually accurate content, incorporating a vast knowledge base into the generation process [44]. - It can connect to real-time web content for generating outputs based on the latest data, enhancing the accuracy of visual representations [45][46]. Group 5: User Accessibility - Nano Banana Pro will be available across various Google products, targeting consumers, professionals, and developers, with different access levels based on subscription types [59][60][61]. - The model will also be integrated into Google Workspace applications, enhancing productivity tools like Google Slides and Google Vids [62]. Group 6: Verification and Transparency - Google has introduced a new feature allowing users to verify whether an image was generated or edited by Google AI, enhancing content transparency [56][57]. - This capability is powered by SynthID, a digital watermarking technology that embeds imperceptible signals into AI-generated content [57].