Workflow
图像生成模型
icon
Search documents
一手实测Nano Banana 2,我总结了8大新玩法
Xin Lang Cai Jing· 2026-02-27 15:22
Core Insights - The official release of "Banana 2," now named Gemini 3.1 Flash Image, combines the capabilities of Nano Banana with enhanced speed [2][4] - The pricing strategy has been significantly reduced, with the cost of generating a 1K image dropping from $0.134 to $0.0672 [4] - The model has introduced new aspect ratios and features that enhance usability for designers [4][7] Performance and Features - Overall performance of Banana 2 is comparable to its Pro version, with some scenarios showing slight regressions [4] - The model can generate images at a speed of 347-356 images per minute, with an API response time of 2 seconds [4] - New aspect ratios (4:1, 1:4, 8:1, 1:8) have been added, expanding design possibilities [4][7] - A low-resolution option of 512px has been introduced alongside existing 1K, 2K, and 4K options [4] - The built-in web image search feature allows real-time reference image retrieval, enhancing the accuracy of generated images [4][26] Usability Enhancements - The model allows for easy modification of image dimensions while maintaining original structure and elements [7][9] - New editing functionalities have been added, making it easier for designers to adjust text and other elements in generated images [11][13] - The ability to generate multiple images in a single request has been highlighted as a significant improvement, allowing for consistent character representation across various scenes [56][58] Conclusion - Banana 2 is positioned as a leading image generation model, offering faster and cheaper services without compromising on performance [59][60] - The integration with Lovart's unique features provides additional creative possibilities for users [61][62] - The model's flexibility in understanding vague prompts allows for a wide range of creative outputs [62][63]
字节发完阿里发,Qwen-Image 2.0火线出击
3 6 Ke· 2026-02-10 12:52
Core Viewpoint - Alibaba has launched its new image generation model Qwen-Image 2.0, which supports up to 1,000 tokens for long instructions and 2K resolution, featuring a lighter architecture that enhances inference speed compared to its predecessor [2][37]. Group 1: Model Performance - Qwen-Image 2.0 excels in long instruction adherence and text rendering, although it slightly lags behind Google's Nano Banana Pro in image realism [2][6]. - In AI Arena testing, Qwen-Image 2.0 ranked third in text-to-image and second in image-to-image benchmarks, indicating competitive performance but still trailing behind Google’s model [6][8]. - The model can render complex text, such as the full text of "Lantingji Xu" in a brush style, while maintaining visual harmony with the background [4][9]. Group 2: Technical Enhancements - Qwen-Image 2.0 has optimized the common "greasy" appearance in AI-generated images, resulting in less saturated colors and a more realistic look [5][34]. - The model's size is significantly reduced compared to version 1.0, which had approximately 20 billion parameters, while still enhancing capabilities and speed [37][39]. - Improvements in the Variational Autoencoder (VAE) have strengthened the model's ability to generate clear and accurate small text, addressing previous issues of text distortion [39]. Group 3: Future Developments - The Qwen-Image team plans to focus on generating complex "parent images" like PPTs and multi-image posters, aiming to reduce hallucinations and errors in future iterations [14][40]. - The integration of image generation and editing capabilities is expected to enhance the model's utility, allowing for more flexible workflows in design [34][35]. - Collaborations with applications like WPS are planned to gather user feedback for continuous model improvement [40]. Group 4: Market Implications - The advancements in Qwen-Image 2.0 position it as a potential productivity tool across various industries, including e-commerce and healthcare, by visualizing complex processes and generating marketing materials [39][41]. - The rapid iteration and application of AI-generated content in China are anticipated to foster new industry chains and accelerate model development [39][41].
字节跳动图像生成模型Seedream 5.0上线,可免费体验
Xin Lang Cai Jing· 2026-02-10 11:42
Core Viewpoint - ByteDance has officially launched its image generation model Seedream 5.0, which is now available in video editing applications such as Jianying, CapCut, and the AI creation platform Xiaoyunque, with a limited-time free experience on the Jimeng AI platform [1][2]. Group 1 - The new model enhances accuracy and intelligence levels, faster image creation expressiveness, and integrated online knowledge capabilities [2]. - Seedream 5.0 can deeply understand the semantics of prompts, generating images that better match user intentions with higher detail precision and clearer layouts [2]. - The model's image-to-image functionality has improved stylization effects, providing clearer details, refined textures, and balanced lighting [2]. Group 2 - The upgrade includes new editing features, allowing users to precisely select and adjust elements using brush control [2].
刚刚,Seedream 5.0上线,字节又一新模型
3 6 Ke· 2026-02-10 06:56
Core Insights - ByteDance has launched its new image generation model, Seedream 5.0, which is now available on various platforms including Capcut and the AI creation platform Xiaoyunque, with a limited free trial for users [1][3]. Group 1: Model Features and Enhancements - Seedream 5.0 supports image outputs in 2K and 4K resolutions, with improvements in understanding prompts and generating detailed, textured images [3][9]. - The model enhances its ability to understand abstract prompts, allowing for more precise image generation that aligns with user intent [9][10]. - New features include improved stylization effects, enhanced image-to-image functionality, and added editing capabilities for users to control brush selections and adjustments [9][10]. Group 2: Performance Comparison - Compared to its predecessor Seedream 4.5, the improvements in Seedream 5.0 are noted to be incremental rather than revolutionary, particularly in terms of abstract understanding and visual output [10][28]. - User experiences indicate that while Seedream 5.0 can generate detailed explanations and visuals, it still struggles with certain complex tasks compared to competitors like Nano Banana Pro [5][22]. - The model's performance in generating images based on specific prompts shows variability, with some outputs lacking accuracy in details such as time on clocks or character positioning [22][24]. Group 3: User Feedback and Market Position - User feedback suggests that the focus of Seedream 5.0's upgrades is on practical intelligence rather than aesthetic appeal, with a notable emphasis on handling knowledge-driven tasks [5][7]. - The model's ability to generate diverse styles, from modern to traditional, is highlighted, although it still faces challenges in accurately rendering complex scenes [24][26]. - Overall, the industry trend indicates a shift towards enhancing practical capabilities in image generation models, with Seedream 5.0 aligning with user needs for improved understanding and controllable outputs [28].
字节又一新模型!Seedream 5.0上线,对标Nano Banana Pro
Hua Er Jie Jian Wen· 2026-02-10 05:49
Group 1 - ByteDance's image generation model Seedream 5.0 has been launched on video editing applications Jianying, the overseas version Capcut, and the Byte AI creation platform Xiaoyunque [1] - The new model is positioned to compete with Nano Banana Pro and offers a limited-time free experience [1] - The platform Jiemeng AI has initiated a grayscale testing phase for the image generation feature [1]
智谱联合华为开源图像生成模型GLM-Image,24小时登顶Hugging Face榜单
Xin Lang Cai Jing· 2026-01-16 00:45
Core Insights - The collaboration between Zhiyuan and Huawei has led to the open-source release of the new image generation model GLM-Image, which completed the entire process from data to training using the Ascend Atlas 800T A2 device and MindSpore AI framework [1][2] - Within 24 hours of its release, GLM-Image achieved the top position on the Hugging Face leaderboard, a well-known AI open-source community [1][2] - GLM-Image utilizes an innovative "autoregressive + diffusion decoder" hybrid architecture, addressing challenges in generating knowledge-intensive scenarios such as posters, PPTs, and educational images, with a particular strength in generating Chinese characters [1][2] - The training process of GLM-Image demonstrated that it could reach the performance limits of the corresponding computing device, validating the feasibility of training advanced models on domestic full-stack computing platforms [1][2]
港股午评|恒生指数早盘跌0.55% 有色资源板块逆市走高
Zhi Tong Cai Jing· 2026-01-15 04:08
Market Overview - The Hang Seng Index fell by 0.55%, down 149 points, closing at 26,850 points, while the Hang Seng Tech Index dropped by 1.83% [1] - The trading volume in the Hong Kong stock market reached HKD 163.9 billion in the morning session [1] Company Highlights - Jiexin International Resources (03858) rose over 5%, reaching a new high as black tungsten ore prices surpassed 500,000 yuan, prompting several tungsten companies to raise long-term contract prices [1] - Likin Resources (02245) increased by over 10% due to disruptions in Indonesian nickel ore quotas, while Zhongwei New Materials (02579) gained over 9% [1] - China Rare Earth Holdings (03788) experienced significant volatility, rising by 8% after announcing the termination of its spin-off listing plan and plans to rename itself "Rare Earth Gold" [1] - Ocean Park Hong Kong (02255) surged over 10%, with visitor numbers on the first day of the New Year holiday increasing by 60% year-on-year [1] - Zhipu (02513) rose over 4% after announcing a collaboration with Huawei to open-source a new generation image generation model [1] - China Heartland Fertilizer (01866) increased by over 4%, anticipating a potential global urea supply shortage, and has been actively repurchasing shares [1] - Woan Robotics (06600) gained over 7% following the release of its humanoid intelligent robot, Onero [1] - Jiantao Laminates (01888) saw a nearly 6% increase after announcing a price hike, which is expected to become a trend in the copper-clad laminate industry [1] - Qiu Tai Technology (01478) declined over 7% as Citigroup reported that the company's net profit last year fell below expectations [1] Other Notable Movements - Kanglong Chemical (03759) dropped over 5% after announcing a placement of shares at an 8.5% discount, aiming to raise nearly HKD 1.32 billion [2] - Trip.com Group (09961) plummeted over 19% due to an investigation by the State Administration for Market Regulation for alleged monopoly practices, while Same City Travel (00780) fell over 11% [2]
智谱逆市涨超6% 日前宣布联合华为开源新一代图像生成模型
Zhi Tong Cai Jing· 2026-01-15 03:09
Core Viewpoint - Zhizhu (02513) saw a significant increase of over 6%, currently trading at 229.8 HKD with a transaction volume of 335 million HKD, following the announcement of a collaboration with Huawei on the open-source next-generation image generation model GLM-Image [1] Group 1: Company Developments - Zhizhu announced the launch of GLM-Image, the first state-of-the-art (SOTA) multimodal model fully trained on domestic chips, utilizing the Ascend Atlas 800T A2 device and MindSpore AI framework [1] - The GLM-Image model integrates image generation with language models, allowing for image generation at a cost of only 0.1 yuan per image when using API calls [1] Group 2: Market Outlook - Dongwu Securities views Zhizhu as a pure large model player benefiting from cloud-scale effects and the advantages of agent/programming scenarios [1] - The company is expected to leverage its strengths in local large model technology, open-source ecosystem development, and localized implementation capabilities in government and enterprise sectors [1] - There is a positive outlook for Zhizhu as the Chinese large model industry transitions from localized deployment to cloud services, indicating a long-term growth trend [1]
刚刚,智谱和华为搞波大的:中国首个国产芯片训练出的SOTA多模态模型!
量子位· 2026-01-14 06:32
Core Viewpoint - The article highlights the launch of GLM-Image, a state-of-the-art (SOTA) multimodal model developed by Zhipu AI in collaboration with Huawei, which is notable for being trained entirely on domestic chips and excelling in text rendering capabilities [1][36]. Group 1: Model Performance - GLM-Image achieved first place in both the CVTG-2K (Complex Visual Text Generation) and LongText-Bench (Long Text Rendering) benchmarks, demonstrating superior performance with a word accuracy of 0.9116 and a normalized edit distance (NED) of 0.9557 [5][6]. - In the LongText-Bench, GLM-Image ranked first among open-source models in both Chinese and English scores, indicating its versatility and effectiveness in handling different languages [6]. Group 2: Cost Efficiency - The cost of generating an image using GLM-Image's API is only 0.1 yuan (approximately 0.014 USD), making it an affordable option for users [7][21]. - This low cost positions GLM-Image as a competitive choice for businesses and developers looking to integrate AI image generation capabilities [60]. Group 3: Technical Innovation - GLM-Image employs a hybrid architecture combining autoregressive and diffusion models, allowing it to understand complex prompts and generate high-quality images effectively [38][40]. - The model was trained on Huawei's Ascend A2 chips, showcasing the potential of domestic computing power in supporting advanced AI models [44][48]. - The training process included optimizations for reinforcement learning (RL) to ensure stability and efficiency, which is critical for handling large-scale models [51]. Group 4: Market Impact - GLM-Image represents a significant advancement in the domestic AI landscape, challenging the dominance of foreign models and proving that high-performance models can be developed using local resources [57][60]. - The open-source nature of GLM-Image, along with its innovative architecture, provides valuable resources for researchers and developers in the field of image generation [59][60].
阿里Z-Image登顶开源图像生成模型榜单:1秒生图 千图仅需5美元
Xin Lang Cai Jing· 2025-12-23 03:33
Core Insights - Alibaba's Z-Image Turbo has topped the open-source image generation model rankings, surpassing the 32B parameter FLUX.2 model, making it the strongest open-source image generation model available [1][2] - The model is now available on Alibaba Cloud, with a cost of $5 for generating 1,000 images, showcasing its affordability [1] - Z-Image Turbo achieved an ELO score of 1152, setting a new record in the rankings [1] Performance and Features - Z-Image Turbo significantly enhances image realism, accurately rendering details such as skin texture, hair, and fabric materials [3] - The model supports bilingual text rendering, maintaining clarity and natural layout even in complex scenarios like small fonts and intricate designs [3] - It utilizes a single-stream diffusion Transformer architecture, integrating text, image latent variables, and time-step conditions into a unified sequence input, which improves parameter utilization [5] Efficiency and Speed - The inference process has been optimized, reducing the original 20-step generation process to just 8 steps, thereby increasing image generation speed [5] - The model can generate images comparable to those produced by 100 billion parameter models in just 1 second when deployed in an H100 environment [2][5] Market Reception - Z-Image Turbo was open-sourced at the end of November and quickly reached the top of the Hugging Face popularity chart, maintaining its position for three consecutive weeks [7] - Within a month of its release, the model has been downloaded over 4 million times, indicating its popularity in the market [7]