Workflow
AI绘画
icon
Search documents
腾讯混元开源AI绘画新框架:24维度对齐人类意图,让AI读懂复杂指令
量子位· 2025-09-17 01:42
Core Viewpoint - The article discusses the challenges faced by AI painting models in accurately interpreting human instructions and presents Tencent's PromptEnhancer framework as a solution to improve text-image alignment without modifying pre-trained models [2][4][12]. Group 1: Challenges in AI Painting - AI painting models struggle with understanding concise user instructions, leading to inaccuracies in generated images [9][10]. - Common issues include chaotic attribute binding, ineffective negation commands, and failure to comprehend complex spatial relationships [10][11]. Group 2: PromptEnhancer Framework - PromptEnhancer introduces a decoupled prompt optimization framework consisting of two main modules: CoT-based Rewriter and AlignEvaluator [12][14]. - The CoT-based Rewriter mimics human designers by breaking down instructions into core elements, potential ambiguities, and detailed supplements [15][19]. - AlignEvaluator provides a scoring system across 24 key dimensions to accurately identify errors in generated images [20][21]. Group 3: Performance Improvements - Testing on the HunyuanImage 2.1 model shows a 5.1% overall accuracy improvement, with significant gains in complex scene understanding [29]. - Specific dimensions such as "similarity relations" and "counterfactual reasoning" saw accuracy increases of 17.3% and 17.2%, respectively [29]. Group 4: Dataset and Research Support - Tencent's team released a high-quality benchmark dataset containing 6,000 prompts to aid in the training and evaluation of the PromptEnhancer [7][45]. - The dataset covers various complex scenarios, including everyday creative extensions and abstract relationship challenges [46]. Group 5: Future Implications - The advancements brought by PromptEnhancer position it as a critical tool for enhancing AI painting's applicability in professional fields like industrial design and advertising [54][55]. - The framework's ability to optimize instructions without altering model weights allows for broader adaptability across different T2I models [57].
腾讯混元升级AI绘画微调范式,在整个扩散轨迹上优化,人工评估分数提升300%
量子位· 2025-09-15 03:59
Core Viewpoint - The article discusses advancements in AI image generation, specifically focusing on the introduction of two key methods, Direct-Align and Semantic Relative Preference Optimization (SRPO), which significantly enhance the quality and aesthetic appeal of generated images [5][14]. Group 1: Current Challenges in Diffusion Models - Existing diffusion models face two main issues: limited optimization steps leading to "reward hacking," and the need for offline adjustments to the reward model for achieving good aesthetic results [4][8]. - The optimization process is constrained to the last few steps of the diffusion process due to high gradient computation costs [8]. Group 2: Direct-Align Method - Direct-Align method allows for the recovery of original images from any time step by pre-injecting noise, thus avoiding the limitations of optimizing only in later steps [5][10]. - This method enables the model to recover clear images from high noise states, addressing the gradient explosion problem during early time step backpropagation [11]. - Experiments show that even at just 5% denoising progress, Direct-Align can recover a rough structure of the image [11][19]. Group 3: Semantic Relative Preference Optimization (SRPO) - SRPO redefines rewards as text-conditioned signals, allowing for online adjustments without additional data by using positive and negative prompt words [14][16]. - The method enhances the model's ability to generate images with improved realism and aesthetic quality, achieving approximately 3.7 times and 3.1 times improvements, respectively [16]. - SRPO allows for flexible style adjustments, such as brightness and cartoon style conversion, based on the frequency of control words in the training set [16]. Group 4: Experimental Results - Comprehensive experiments on the FLUX.1-dev model demonstrate that SRPO outperforms other methods like ReFL, DRaFT, and DanceGRPO across multiple evaluation metrics [17]. - In human evaluations, the excellent rate for realism increased from 8.2% to 38.9% and for aesthetic quality from 9.8% to 40.5% after SRPO training [17][18]. - Notably, a mere 10 minutes of SRPO training allowed FLUX.1-dev to surpass the latest open-source version FLUX.1.Krea on the HPDv2 benchmark [19].
AI绘画、组队攀岩…“退休俱乐部”重新定义退休生活
Sou Hu Cai Jing· 2025-08-27 09:19
Core Insights - The emergence of "retirement clubs" is providing new options for retirees, offering activities that break traditional stereotypes of aging [1][13] - These clubs are primarily attracting individuals aged 50 to 65, indicating a growing interest in active lifestyles among older adults [13] Group 1: Retirement Clubs - "Retirement clubs" are organizing diverse activities such as AI painting, DJing, rock climbing, and coffee art, catering to the interests of newly retired individuals [7][11] - The founder of a notable retirement club, Liu Jing, initially aimed to find companions for her parents but ended up attracting over ten thousand elderly members, highlighting the demand for such social platforms [11] - The rapid growth of these clubs across the country is injecting new vitality into the silver economy, which is projected to reach a market size of 30 trillion yuan by 2035 [13] Group 2: Participant Experiences - Participants, like Ms. Liu, express that these activities not only help in physical exercise but also provide opportunities for social interaction and learning [5][9] - The clubs are fostering a sense of community among retirees, allowing them to engage in enjoyable and enriching experiences [5][11]
藏师傅教你做即将爆火的AI玄学祈福壁纸,不止提示词还有创作思路
歸藏的AI工具箱· 2025-08-04 06:42
Core Viewpoint - The article provides a tutorial on creating AI-generated wish and blessing wallpapers, combining traditional elements with modern aesthetics, and emphasizes the importance of creativity in the design process [1][4][22]. Group 1: Tutorial Overview - The tutorial includes a detailed video guide for creating AI wallpapers, focusing on the integration of traditional motifs with contemporary styles [1][3]. - It introduces a template for prompt writing, which helps in generating unique creative ideas by modifying various elements of the design [4][9]. Group 2: Design Elements - The design is based on a vintage ticket concept with a beige background and intricate green borders, featuring characters like Zhong Kui in modern attire [5][12]. - The structure of the prompt is divided into three parts: main structure, character description, and content layout, allowing for flexible modifications to enhance creativity [9][10][16]. Group 3: Creative Techniques - The article discusses how to adapt the character's attire and actions to reduce seriousness and make the designs more relatable [12][19]. - It encourages exploring different cultural references and modern themes, such as using characters from popular media to create relatable wish imagery [20][22].
赛道Hyper | Black Forest开源新模型:文本P图党福音
Hua Er Jie Jian Wen· 2025-07-03 05:50
Core Insights - The competition in the AI image generation field is intensifying, with open-source and closed-source models increasingly at odds. The launch of the open-source model FLUX.1-Kontext by Black Forest has garnered significant attention due to its ability to edit images based on natural language instructions, outperforming OpenAI's latest GPT-image-1 in key metrics [1][5]. Technical Architecture - FLUX.1-Kontext consists of three key modules: natural language parsing, image generation, and multimodal fusion [2]. - The natural language parsing layer utilizes an improved Transformer architecture with 8 layers of self-attention, enabling deep semantic breakdown of user instructions [3]. - The image generation engine is built on an enhanced diffusion model (DPM-Solver++) that introduces a dynamic noise scheduling mechanism, adjusting denoising iterations based on instruction complexity [4]. - The multimodal fusion layer employs a pre-trained CLIP model and visual Transformer to dynamically match text and image feature vectors, addressing common issues in traditional models [4]. Competitive Advantages - FLUX.1-Kontext's open-source nature significantly lowers the application barrier for enterprises, with potential savings of over 60% in server costs compared to closed-source models like GPT-image-1 [5]. - The model has optimized its technology to address shortcomings in similar products, such as improved long-text parsing capabilities and a style vector pool mechanism for quick style application [5]. - The application of FLUX.1-Kontext is reshaping the image creation industry, with companies reporting significant reductions in time and costs for design tasks [6]. Educational Impact - The introduction of AI instruction design courses in design education reflects a shift in core competencies for future designers, emphasizing the ability to translate abstract ideas into machine-readable instructions [6][7]. Challenges and Future Developments - Despite its advantages, FLUX.1-Kontext faces challenges such as copyright risks due to the use of approximately 120 million internet images for training, and technical limitations in handling complex physical effects [8][9]. - The model's understanding of non-English instructions is less accurate, indicating a need for improved multilingual support [9]. - Black Forest has announced plans for future iterations of FLUX.1-Kontext, including real-time interactive editing features and collaborations for style transfer models [9]. Broader Applications - The open-source model is expected to find applications across various sectors, including healthcare for generating diagnostic images, education for creating teaching illustrations, and entertainment for game and film production [10]. - The open innovation model of FLUX.1-Kontext provides global developers with opportunities to participate in the evolution of AI painting technology, potentially accelerating industry-wide advancements [10].
黄建南:在视象与心象之间重构东方美学
Jing Ji Guan Cha Bao· 2025-05-23 08:01
Core Perspective - Huang Jiannan's artistic journey reflects a profound exploration of how visual experience resonates with the inner landscape, breaking traditional boundaries between Eastern and Western art forms [2][4][6] Group 1: Artistic Philosophy and Techniques - Huang Jiannan's work embodies a unique creative philosophy where art is not merely imitation of nature but a manifestation of life's will, as seen in his piece "Desert Solitude" [1][4] - His approach integrates elements from both Eastern and Western art, such as blending Monet's impressionism with traditional Chinese ink techniques, creating a dialogue between different cultural expressions [2][4] - The artist's innovative use of materials and techniques, like using oil painting to express traditional Chinese aesthetics, has been recognized as a significant contribution to contemporary art [4][6] Group 2: Market Trends and Value - Huang Jiannan's early works, marked by his travels, have shown an annual appreciation rate of 27%, significantly higher than the average 12% in the art market [5] - Collaborations with scientific institutions, such as NASA, have led to substantial increases in the value of his works, exemplified by the "Interstellar" series, which quadrupled in price over three years [5] - His unique market strategy, which includes the integration of digital art with traditional painting, reflects a broader trend in contemporary art where the hand-crafted aspect is becoming a rare and valuable resource [5][6] Group 3: Cultural Impact and Recognition - Huang Jiannan's work has been pivotal in establishing a cultural dialogue on a global scale, as evidenced by his large installation at the Venice Biennale, which asserts cultural authority [6] - His recent exhibition at the British Museum highlights the relevance of his work in addressing contemporary artistic challenges, particularly in the context of digital art's rise [6][7] - The artist's philosophy emphasizes a connection between local experiences and global narratives, showcasing the duality of being rooted in tradition while reaching for global recognition [6][7]
方寸藏书票 水印意趣浓
Ren Min Ri Bao· 2025-05-17 21:52
Core Viewpoint - The article highlights the cultural significance and artistic value of bookplates, particularly the Chinese watermarked woodcut bookplates, which blend traditional Chinese aesthetics with Western influences, fostering a unique dialogue between the two cultures [3][4]. Group 1: Cultural and Artistic Significance - Bookplates originated in 15th century Europe as a means for nobles to mark their collections, combining artistic and practical value, and are referred to as "pearls of print" and "sails of the sea of books" [3]. - The introduction of bookplates to China in the early 20th century led to a fusion with traditional watermarked woodcut techniques, resulting in a distinctive form of art that reflects both cultures [3]. - Recent exhibitions and activities, such as the "Joy of Collecting" event, aim to promote and experience the unique charm of Chinese watermarked woodcut bookplates [3]. Group 2: Creative Process and Themes - For book lovers, a beautifully crafted bookplate serves as both decoration and a connection to the book and its owner, encompassing themes like mythology, zodiac signs, poetry, and personal reflections [4]. - The creation of a personalized watermarked woodcut bookplate is described as a poetic and meticulous artistic journey, involving various stages from conception to printing, reflecting the creator's respect for culture and love for books [4][5]. - The production process of watermarked woodcut bookplates is complex and influenced by factors such as paper moisture and color thickness, providing a deep cultural experience [5]. Group 3: Challenges and Opportunities - The rise of digital reading and AI-generated art presents both challenges and opportunities for the development of bookplates, emphasizing the need for further research, creation, and exhibition to ensure their cultural legacy [5].
GPT4o生成的烂自拍,反而比我们更真实
Hu Xiu· 2025-04-30 23:05
Core Viewpoint - The article discusses the unexpected popularity of AI-generated images, particularly those created using a simple prompt in GPT-4o, which evoke a sense of realism and authenticity that resonates with users [1][18][108] Group 1: AI Image Generation - The prompt used to generate images is straightforward, asking for an ordinary iPhone selfie that appears unremarkable and candid [27][28] - The images produced have a unique quality that makes them feel real, as they lack the polish and perfection typically associated with social media photos [74][96] Group 2: Cultural Impact - The phenomenon of AI-generated images reflects a broader cultural shift towards valuing authenticity over perfection in visual representation [65][108] - The article highlights how these images resonate with people's experiences of everyday life, capturing moments that are often overlooked or deemed unworthy of documentation [54][96] Group 3: Social Media Critique - There is a critique of social media culture, where users often present an idealized version of themselves, leading to a general distrust of online images [75][96] - The emergence of these AI-generated images challenges the norm by presenting a raw and unfiltered perspective, which many find refreshing [84][108]
GPT4o生成的烂自拍,反而比我们更真实。
数字生命卡兹克· 2025-04-29 19:27
我是没想到,GPT4o用一段小小的Prompt生成的一些图片,引发的热度浪潮。 能有这么长久,现在依然不断冒出着,各种创意。 我相信无数人都在社交平台里,刷到过这些图。 比如京东外卖跟美团外卖干架干的热火朝天。 但是强子跟兴哥,却穿着各自的工服,在上海外滩友好自拍,虽然兴哥看着有点不嘻嘻。 周杰伦和林俊杰、陈奕迅,也来到了广州小蛮腰和上海,摆出了同样的自拍。 还有一张来自中土世界的自拍,C罗和梅西,也到清华一游。 绝命毒师来到了天津。 当然,我最佩服的还是今天刷到的这个小红书。 《45岁,离职北大》,脑洞无敌,数据也直接拉爆,将近12万的赞。 甚至不止是人,猫也行。 这些图,过于真实,不断的在欺骗大家的大脑。 告诉你,这个好像很真实。 真实的就像一个路人,随手用手机拍了一下一样。 我昨晚回家,随手拍了一张。 他们居然也说是AI画的。。。 之所以不用ChatGPT里面的4o生成,就是单纯的因为,Sora上生图的体验更好,因为本质上模型都是一样的,但是Sora上可以一次生成多张,比例的预设 啥的也都在。 比如我就想画马斯克和一个美女一起打游戏的画面。 一张来自马斯克的超级真实的自拍,就出来了。 这个Prompt, ...
当画法遇上算法
Ren Min Ri Bao· 2025-04-26 21:58
Group 1 - The emergence of AI technology is ushering in a new era of visual intelligence, akin to the impact of photography in the 19th century [1] - AI art exhibitions are gaining popularity, showcasing the influence of AI on human culture and artistic expression [1] - The efficiency and productivity of AI in art creation are initially impressive, but the emotional depth and personal connection found in traditional painting remain irreplaceable [1] Group 2 - AI painting has unique expansion potential, with advancements like brain-computer interfaces allowing individuals with disabilities to create art with heightened emotional intensity [2] - AI can serve as a tool for creativity, but it cannot replace the human creator, who must possess the ability to ask questions and have a rich knowledge base [2] - Both AI and traditional painting require creators to engage deeply with life and contemporary issues to avoid mediocrity in their work [2] Group 3 - The challenge lies in balancing the analytical capabilities of AI with the intangible aspects of human thought and emotion, drawing inspiration from traditional art forms like Chinese ink painting [3] - The integration of human intuition and AI's rationality is essential for AI painting to achieve new levels of creativity [3]