Workflow
图像生成
icon
Search documents
腾讯申请图像生成相关专利,可对图像生成的逐步引导和稳健控制
Jin Rong Jie· 2025-08-16 09:19
Core Insights - Tencent Technology (Shenzhen) Co., Ltd. has applied for a patent titled "Image Generation Method, Device, Equipment, Medium, and Product" with publication number CN120495475A, filed on May 2025 [1] - The patent describes a method for generating images based on object input text, which includes processes for denoising random noise images and enhancing text prompts to create target images [1] Company Overview - Tencent Technology (Shenzhen) Co., Ltd. was established in 2000 and is located in Shenzhen, primarily engaged in software and information technology services [1] - The company has a registered capital of 2 million USD and has invested in 15 enterprises, participated in 264 bidding projects, and holds 5000 trademark and patent records, along with 534 administrative licenses [1]
Lumina-mGPT 2.0:自回归模型华丽复兴,媲美顶尖扩散模型
机器之心· 2025-08-12 00:15
辑、可控生成和密集预测在内的广泛任务。 本文第一作者辛毅为南京大学 & 上海创智学院博士生,现于上海人工智能实验室实习,研究方向为图像 / 视频生成、多模态生成与理解统一等。通讯作者为上海 人工智能实验室青年科学家 — 高鹏。本文其他作者来自上海人工智能实验室、香港中文大学、上海交通大学、上海创智学院、浙江工业大学等。 核心技术与突破 完全独立的训练架构 不同于依赖预训练权重的传统方案,Lumina-mGPT 2.0 采用纯解码器 Transformer 架构,从参数初始化开始完全独立训练。这带来三大优势:架构设计不受限制 (提供了 20 亿和 70 亿参数两个版本)、规避授权限制(如 Chameleon 的版权问题)、减少预训练模型带来的固有偏差。 上海人工智能实验室等团队提出Lumina-mGPT 2.0 —— 一款独立的、仅使用解码器的自回归模型,统一了包括文生图、图像对生成、主体驱动生成、多轮图像编 论文标题:Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling 论文链接:arxiv.org/pdf/2507.17801 GitHub 地 ...
Qwen新开源,把AI生图里的文字SOTA拉爆了
量子位· 2025-08-05 01:40
Core Viewpoint - The article discusses the release of Qwen-Image, a 20 billion parameter image generation model that excels in complex text rendering and image editing capabilities [3][28]. Group 1: Model Features - Qwen-Image is the first foundational image generation model in the Tongyi Qianwen series, utilizing the MMDiT architecture [4][3]. - It demonstrates exceptional performance in complex text rendering, supporting multi-line layouts and fine-grained detail presentation in both English and Chinese [28][32]. - The model also possesses consistent image editing capabilities, allowing for style transfer, modifications, detail enhancement, text editing, and pose adjustments [27][28]. Group 2: Performance Evaluation - Qwen-Image has achieved state-of-the-art (SOTA) performance across various public benchmark tests, including GenEval, DPG, OneIG-Bench for image generation, and GEdit, ImgEdit, GSO for image editing [29][30]. - In particular, it has shown significant superiority in Chinese text rendering compared to existing advanced models [33]. Group 3: Training Strategy - The model employs a progressive training strategy that transitions from non-text to text rendering, gradually moving from simple to complex text inputs, which enhances its native text rendering capabilities [34]. Group 4: Practical Applications - The article includes practical demonstrations of Qwen-Image's capabilities, such as generating illustrations, PPTs, and promotional images, showcasing its ability to accurately integrate text with visuals [11][21][24].
开源!通义千问推出系列中首个图像生成基础模型Qwen-Image
Hua Er Jie Jian Wen· 2025-08-04 21:09
Core Insights - The article discusses the launch of Qwen-Image, a 20 billion parameter MMDiT model, which is the first foundational model for image generation in the Tongyi Qwen series, achieving significant advancements in complex text rendering and precise image editing [1] Group 1 - Qwen-Image is a foundational model specifically designed for image generation [1] - The model has made notable progress in rendering complex text and editing images accurately [1]
训练时间减半,性能不降反升!腾讯混元开源图像生成高效强化方案MixGRPO
量子位· 2025-08-02 08:33
Core Viewpoint - The article introduces MixGRPO, a new framework that combines Stochastic Differential Equations (SDE) and Ordinary Differential Equations (ODE) to enhance the efficiency and performance of image generation processes [1][81]. Group 1: MixGRPO Framework - MixGRPO simplifies the optimization process in Markov Decision Processes (MDP) by utilizing a mixed sampling strategy, which improves both efficiency and performance [1][17]. - The framework shows significant improvements in human preference alignment across multiple dimensions, outperforming DanceGRPO with a training time reduction of nearly 50% [2][60]. - MixGRPO-Flash, a faster variant of MixGRPO, further reduces training time by 71% while maintaining similar performance levels [2][60]. Group 2: Performance Metrics - In comparative studies, MixGRPO achieved a higher Unified Reward score of 3.418, compared to DanceGRPO's 3.397, indicating better alignment with human preferences [60]. - MixGRPO-Flash demonstrated an average iteration time of 112.372 seconds, significantly lower than DanceGRPO's 291.284 seconds [60]. Group 3: Sampling Strategy - The MixGRPO framework employs a hybrid sampling method, where SDE sampling is used within a defined interval during the denoising process, while ODE sampling is applied outside this interval [14][20]. - This approach allows for a reduction in computational overhead and optimization difficulty, while ensuring that the sampling process remains aligned with the marginal distributions of SDE and ODE [30][81]. Group 4: Sliding Window Strategy - A sliding window strategy is introduced to optimize the denoising steps, allowing the model to focus on specific time steps during training [32][35]. - The research team identified key hyperparameters for the sliding window, including window size and movement intervals, which significantly impact performance [34][70]. Group 5: High-Order ODE Solvers - The integration of high-order ODE solvers, such as DPM-Solver++, enhances the sampling speed during the GRPO training process, balancing computational cost and performance [45][76]. - The experiments indicated that a second-order midpoint method was optimal for the high-order solver settings [76]. Group 6: Experimental Validation - The experiments utilized the HPDv2 dataset, which includes diverse prompts, demonstrating that MixGRPO can achieve effective human preference alignment with a limited number of training prompts [49][50]. - The results from various reward models confirmed the robustness of MixGRPO, showing superior performance in both single and multi-reward settings [56][82].
Manus突发上新文生图!告别“抽卡”,Agent+深度思考联合创作
量子位· 2025-05-16 05:36
Core Viewpoint - Manus has announced its new feature that supports image generation, which differs from typical AI drawing tools by understanding the user's intent and planning the generation process before execution [1][18]. Group 1: Image Generation Capabilities - Manus can analyze a room's style based on elements like flooring and walls, creating an analysis report before generating visual designs [5][4]. - The tool can search for furniture on websites like IKEA, select suitable items, and provide links along with visual results [7][3]. - Manus has demonstrated its ability to design a beverage bottle for a tea drink called "TeaVive," focusing on appealing to the youth market by analyzing popular visual elements [11]. Group 2: User Experience and Feedback - Users have praised the integration of intelligent workflows with image generation as a great idea [6]. - Some users have expressed concerns about the pricing of the service, with one user noting that a monthly subscription of $39 only allows for limited usage [26][28]. - The registration process for Manus has been simplified, now offering 1000 points upon registration and daily bonuses [22]. Group 3: Competitive Landscape - The emergence of a competing platform, Lovart, which also focuses on design, has prompted Manus to enhance its offerings [18][20]. - Lovart has gained popularity quickly, similar to Manus's initial launch, indicating a competitive environment in the design AI space [19].
Manus推出图像生成功能
news flash· 2025-05-16 05:21
Core Insights - Manus has launched an image generation feature that not only creates images but also understands user intent and plans solutions [1] Company Overview - Manus is positioned to effectively utilize image generation and other tools to accomplish user tasks [1]
刚刚,Manus生图功能强势登场!从设计到搭建网站一站式搞定,1000积分免费薅
机器之心· 2025-05-16 04:39
Core Viewpoint - Manus has transitioned from a highly sought-after platform requiring invitations to a fully open registration system, marking a significant change in accessibility for users [1]. Group 1: New Features and Offerings - Manus is offering 1,000 points for first-time registrations, encouraging users to explore its features [2]. - The platform has introduced an image generation function that not only creates images but also understands user intent and plans solutions effectively [2]. Group 2: User Experience and Functionality - Users can interact with Manus by sending modification requests or stopping tasks at any time, with notifications provided upon task completion [11]. - The platform successfully deployed a website for a bottled tea brand called CoLe within approximately half an hour, showcasing its capabilities [18]. Group 3: Image Generation Performance - The generated images for CoLe's branding were well-received, featuring a design that aligns with the target demographic of teenagers and conveys a fresh, vibrant aesthetic [9][31]. - The integration of intelligent workflows and the combination of intent understanding with image generation were highlighted as strong points of Manus [32]. Group 4: Areas for Improvement - While image generation is relatively fast, other tasks, such as website creation and deployment, have been reported to take several minutes to over ten minutes, indicating a need for performance enhancements [33].
美的集团(000333):2025年一季报点评:持续拓展全球推动数智驱动
Dongguan Securities· 2025-04-30 09:04
Investment Rating - The report maintains an "Accumulate" rating for the company, indicating an expectation that the stock will outperform the market index by 5%-15% over the next six months [7]. Core Views - The company achieved total revenue of 128.43 billion yuan in Q1 2025, representing a year-on-year growth of 20.61%. The net profit attributable to shareholders was 12.42 billion yuan, up 38.02% year-on-year, and the net profit after deducting non-recurring gains and losses was 12.75 billion yuan, also up 38.03% year-on-year, aligning with expectations [1]. - The company is actively expanding its global production capacity and investing in its own brand development to mitigate trade risks. It operates in over 200 countries, with a low revenue share from the U.S. It has 22 R&D centers and 23 major manufacturing bases across multiple continents [5][6]. - The company is advancing its digital intelligence strategy, focusing on the application of large models and Agent technology. It has developed a language model for the home appliance sector, enhancing user interaction and control across various smart products [5]. Summary by Sections Financial Performance - In Q1 2025, the company's gross margin decreased by 1.87 percentage points to 25.45%, while the net profit margin increased by 1.45 percentage points to 9.97%, primarily due to a reduction in expense ratios [5]. - The company forecasts revenue for 2025 to be 443.97 billion yuan, with net profit expected to reach 43.02 billion yuan, translating to an earnings per share (EPS) of 5.61 yuan, corresponding to a price-to-earnings (PE) ratio of 13 times [6]. Strategic Initiatives - The company is committed to its four strategic pillars: technological leadership, direct user engagement, digital intelligence, and global expansion. It aims to enhance its R&D capabilities and maintain a leading position in the industry [5].