图像生成模型
Search documents
一手实测Nano Banana 2,我总结了8大新玩法
Xin Lang Cai Jing· 2026-02-27 15:22
来源:沃垠AI 最近,AI圈有三大奇观:养龙虾,等种子,玩香蕉。 而今天,「香蕉2」正式发布了,官方名字Gemini 3.1 Flash Image。 一看这个名字,你就能明白,谷歌这是把Nano Banana的能力 + Flash的速度,直接合体了。 我们的老朋友lovart.ai,也第一时间接入了2,付费会员依旧0积分体验。 先给大家简单做一个总结: 1)2整体性能和Pro差距不大,部分场景甚至略有退步。 2)多文字生成,依旧容易乱码(尤其是中文)。 3)核心是价格直接砍半。一张1k图,Pro要0.134美元,2只要0.0672美元。 4)同时,速度大幅提升。API可以做到2秒出图,每分钟能够稳定输出347-356张图。 5)新增4:1、1:4、8:1、1:8等超宽/超窄比例,覆盖更多设计场景。 6)新增512px低分辨率选项,加上已有的1K、2K、4K,开发者可以按需选择。 7)内置web图片搜索,可以实时从网上搜索参考图来生成图片。 8)一致性增强,单任务可保持5个角色和14个物体的一致性。 一手实测 虽然整体性能没有质变,但2在「玩法层面」的扩展,明显更有意思了。 下面,给大家分享我实测下来最有价值 ...
字节发完阿里发,Qwen-Image 2.0火线出击
3 6 Ke· 2026-02-10 12:52
Core Viewpoint - Alibaba has launched its new image generation model Qwen-Image 2.0, which supports up to 1,000 tokens for long instructions and 2K resolution, featuring a lighter architecture that enhances inference speed compared to its predecessor [2][37]. Group 1: Model Performance - Qwen-Image 2.0 excels in long instruction adherence and text rendering, although it slightly lags behind Google's Nano Banana Pro in image realism [2][6]. - In AI Arena testing, Qwen-Image 2.0 ranked third in text-to-image and second in image-to-image benchmarks, indicating competitive performance but still trailing behind Google’s model [6][8]. - The model can render complex text, such as the full text of "Lantingji Xu" in a brush style, while maintaining visual harmony with the background [4][9]. Group 2: Technical Enhancements - Qwen-Image 2.0 has optimized the common "greasy" appearance in AI-generated images, resulting in less saturated colors and a more realistic look [5][34]. - The model's size is significantly reduced compared to version 1.0, which had approximately 20 billion parameters, while still enhancing capabilities and speed [37][39]. - Improvements in the Variational Autoencoder (VAE) have strengthened the model's ability to generate clear and accurate small text, addressing previous issues of text distortion [39]. Group 3: Future Developments - The Qwen-Image team plans to focus on generating complex "parent images" like PPTs and multi-image posters, aiming to reduce hallucinations and errors in future iterations [14][40]. - The integration of image generation and editing capabilities is expected to enhance the model's utility, allowing for more flexible workflows in design [34][35]. - Collaborations with applications like WPS are planned to gather user feedback for continuous model improvement [40]. Group 4: Market Implications - The advancements in Qwen-Image 2.0 position it as a potential productivity tool across various industries, including e-commerce and healthcare, by visualizing complex processes and generating marketing materials [39][41]. - The rapid iteration and application of AI-generated content in China are anticipated to foster new industry chains and accelerate model development [39][41].
字节跳动图像生成模型Seedream 5.0上线,可免费体验
Xin Lang Cai Jing· 2026-02-10 11:42
Core Viewpoint - ByteDance has officially launched its image generation model Seedream 5.0, which is now available in video editing applications such as Jianying, CapCut, and the AI creation platform Xiaoyunque, with a limited-time free experience on the Jimeng AI platform [1][2]. Group 1 - The new model enhances accuracy and intelligence levels, faster image creation expressiveness, and integrated online knowledge capabilities [2]. - Seedream 5.0 can deeply understand the semantics of prompts, generating images that better match user intentions with higher detail precision and clearer layouts [2]. - The model's image-to-image functionality has improved stylization effects, providing clearer details, refined textures, and balanced lighting [2]. Group 2 - The upgrade includes new editing features, allowing users to precisely select and adjust elements using brush control [2].
刚刚,Seedream 5.0上线,字节又一新模型
3 6 Ke· 2026-02-10 06:56
Seedance 2.0的热度还没下去,字节新模型又来了! 智东西2月10日报道,今日,字节图像生成模型Seedream 5.0在视频编辑应用剪映、剪映海外版Capcut、字节AI创作平台小云雀均已上线,在即梦AI平台 开启灰度测试,图片生成可限时免费体验。 ▲Capcut官宣截图(左)、小云雀主页模型选择(右) Seedream 5.0的图像支持2K和4K分辨率输出,2K为图片生成直出,4K为AI增强后的分辨率。根据Capcut官网,新模型5.0的升级点为首次支持检索生图, 对提示词的理解准确性增强、支持更细节、精致纹理的图像生成,还允许用户精确调整图像。Seedream 4.5于2025年12月4日上线。 智东西实际体验并对比了Seedream 5.0与Nano Banana Pro、Seedream 4.5,发现新模型可以理解"静谧科技感"等抽象提示词,但最后的生成效果相比 Seedream 4.5很难说有跨越式提升,其联网搜索能力尚不稳定、生成效果升级点体现在更美观、多样化上。 Capcut的官宣推文里提到,Seedream 5.0可与Nano Banana Pro对标,且更便宜,目前所有用户可免费使 ...
字节又一新模型!Seedream 5.0上线,对标Nano Banana Pro
Hua Er Jie Jian Wen· 2026-02-10 05:49
市场有风险,投资需谨慎。本文不构成个人投资建议,也未考虑到个别用户特殊的投资目标、财务状况或需要。用户应考虑本文中的任何 意见、观点或结论是否符合其特定状况。据此投资,责任自负。 字节图像生成模型Seedream 5.0在视频编辑应用剪映、剪映海外版Capcut、字节AI创作平台小云雀均已 上线,在即梦AI平台开启灰度测试,图片生成可限时免费体验。新模型对标Nano Banana Pro,能免费 体验。 风险提示及免责条款 ...
智谱联合华为开源图像生成模型GLM-Image,24小时登顶Hugging Face榜单
Xin Lang Cai Jing· 2026-01-16 00:45
Core Insights - The collaboration between Zhiyuan and Huawei has led to the open-source release of the new image generation model GLM-Image, which completed the entire process from data to training using the Ascend Atlas 800T A2 device and MindSpore AI framework [1][2] - Within 24 hours of its release, GLM-Image achieved the top position on the Hugging Face leaderboard, a well-known AI open-source community [1][2] - GLM-Image utilizes an innovative "autoregressive + diffusion decoder" hybrid architecture, addressing challenges in generating knowledge-intensive scenarios such as posters, PPTs, and educational images, with a particular strength in generating Chinese characters [1][2] - The training process of GLM-Image demonstrated that it could reach the performance limits of the corresponding computing device, validating the feasibility of training advanced models on domestic full-stack computing platforms [1][2]
港股午评|恒生指数早盘跌0.55% 有色资源板块逆市走高
Zhi Tong Cai Jing· 2026-01-15 04:08
Market Overview - The Hang Seng Index fell by 0.55%, down 149 points, closing at 26,850 points, while the Hang Seng Tech Index dropped by 1.83% [1] - The trading volume in the Hong Kong stock market reached HKD 163.9 billion in the morning session [1] Company Highlights - Jiexin International Resources (03858) rose over 5%, reaching a new high as black tungsten ore prices surpassed 500,000 yuan, prompting several tungsten companies to raise long-term contract prices [1] - Likin Resources (02245) increased by over 10% due to disruptions in Indonesian nickel ore quotas, while Zhongwei New Materials (02579) gained over 9% [1] - China Rare Earth Holdings (03788) experienced significant volatility, rising by 8% after announcing the termination of its spin-off listing plan and plans to rename itself "Rare Earth Gold" [1] - Ocean Park Hong Kong (02255) surged over 10%, with visitor numbers on the first day of the New Year holiday increasing by 60% year-on-year [1] - Zhipu (02513) rose over 4% after announcing a collaboration with Huawei to open-source a new generation image generation model [1] - China Heartland Fertilizer (01866) increased by over 4%, anticipating a potential global urea supply shortage, and has been actively repurchasing shares [1] - Woan Robotics (06600) gained over 7% following the release of its humanoid intelligent robot, Onero [1] - Jiantao Laminates (01888) saw a nearly 6% increase after announcing a price hike, which is expected to become a trend in the copper-clad laminate industry [1] - Qiu Tai Technology (01478) declined over 7% as Citigroup reported that the company's net profit last year fell below expectations [1] Other Notable Movements - Kanglong Chemical (03759) dropped over 5% after announcing a placement of shares at an 8.5% discount, aiming to raise nearly HKD 1.32 billion [2] - Trip.com Group (09961) plummeted over 19% due to an investigation by the State Administration for Market Regulation for alleged monopoly practices, while Same City Travel (00780) fell over 11% [2]
智谱逆市涨超6% 日前宣布联合华为开源新一代图像生成模型
Zhi Tong Cai Jing· 2026-01-15 03:09
Core Viewpoint - Zhizhu (02513) saw a significant increase of over 6%, currently trading at 229.8 HKD with a transaction volume of 335 million HKD, following the announcement of a collaboration with Huawei on the open-source next-generation image generation model GLM-Image [1] Group 1: Company Developments - Zhizhu announced the launch of GLM-Image, the first state-of-the-art (SOTA) multimodal model fully trained on domestic chips, utilizing the Ascend Atlas 800T A2 device and MindSpore AI framework [1] - The GLM-Image model integrates image generation with language models, allowing for image generation at a cost of only 0.1 yuan per image when using API calls [1] Group 2: Market Outlook - Dongwu Securities views Zhizhu as a pure large model player benefiting from cloud-scale effects and the advantages of agent/programming scenarios [1] - The company is expected to leverage its strengths in local large model technology, open-source ecosystem development, and localized implementation capabilities in government and enterprise sectors [1] - There is a positive outlook for Zhizhu as the Chinese large model industry transitions from localized deployment to cloud services, indicating a long-term growth trend [1]
刚刚,智谱和华为搞波大的:中国首个国产芯片训练出的SOTA多模态模型!
量子位· 2026-01-14 06:32
Core Viewpoint - The article highlights the launch of GLM-Image, a state-of-the-art (SOTA) multimodal model developed by Zhipu AI in collaboration with Huawei, which is notable for being trained entirely on domestic chips and excelling in text rendering capabilities [1][36]. Group 1: Model Performance - GLM-Image achieved first place in both the CVTG-2K (Complex Visual Text Generation) and LongText-Bench (Long Text Rendering) benchmarks, demonstrating superior performance with a word accuracy of 0.9116 and a normalized edit distance (NED) of 0.9557 [5][6]. - In the LongText-Bench, GLM-Image ranked first among open-source models in both Chinese and English scores, indicating its versatility and effectiveness in handling different languages [6]. Group 2: Cost Efficiency - The cost of generating an image using GLM-Image's API is only 0.1 yuan (approximately 0.014 USD), making it an affordable option for users [7][21]. - This low cost positions GLM-Image as a competitive choice for businesses and developers looking to integrate AI image generation capabilities [60]. Group 3: Technical Innovation - GLM-Image employs a hybrid architecture combining autoregressive and diffusion models, allowing it to understand complex prompts and generate high-quality images effectively [38][40]. - The model was trained on Huawei's Ascend A2 chips, showcasing the potential of domestic computing power in supporting advanced AI models [44][48]. - The training process included optimizations for reinforcement learning (RL) to ensure stability and efficiency, which is critical for handling large-scale models [51]. Group 4: Market Impact - GLM-Image represents a significant advancement in the domestic AI landscape, challenging the dominance of foreign models and proving that high-performance models can be developed using local resources [57][60]. - The open-source nature of GLM-Image, along with its innovative architecture, provides valuable resources for researchers and developers in the field of image generation [59][60].
阿里Z-Image登顶开源图像生成模型榜单:1秒生图 千图仅需5美元
Xin Lang Cai Jing· 2025-12-23 03:33
Core Insights - Alibaba's Z-Image Turbo has topped the open-source image generation model rankings, surpassing the 32B parameter FLUX.2 model, making it the strongest open-source image generation model available [1][2] - The model is now available on Alibaba Cloud, with a cost of $5 for generating 1,000 images, showcasing its affordability [1] - Z-Image Turbo achieved an ELO score of 1152, setting a new record in the rankings [1] Performance and Features - Z-Image Turbo significantly enhances image realism, accurately rendering details such as skin texture, hair, and fabric materials [3] - The model supports bilingual text rendering, maintaining clarity and natural layout even in complex scenarios like small fonts and intricate designs [3] - It utilizes a single-stream diffusion Transformer architecture, integrating text, image latent variables, and time-step conditions into a unified sequence input, which improves parameter utilization [5] Efficiency and Speed - The inference process has been optimized, reducing the original 20-step generation process to just 8 steps, thereby increasing image generation speed [5] - The model can generate images comparable to those produced by 100 billion parameter models in just 1 second when deployed in an H100 environment [2][5] Market Reception - Z-Image Turbo was open-sourced at the end of November and quickly reached the top of the Hugging Face popularity chart, maintaining its position for three consecutive weeks [7] - Within a month of its release, the model has been downloaded over 4 million times, indicating its popularity in the market [7]