Workflow
图像生成
icon
Search documents
光学AI图像生成器能耗降至毫焦级
Ke Ji Ri Bao· 2025-08-29 00:32
Core Insights - A research team from the University of California, Los Angeles, has developed a new type of image generator that uses light beams instead of traditional computing hardware, significantly reducing energy consumption to one hundred-thousandth of standard AI tools, requiring only a few millijoules [1][2] Group 1: Technology Overview - Traditional digital diffusion models require hundreds to thousands of iterations to generate images, while the new system only needs initial encoding without additional computation [2] - The system utilizes a digital encoder trained on publicly available image datasets to create static encodings that can be converted into images [2] - The encoding is physically imprinted onto a laser beam using a Spatial Light Modulator (SLM), allowing for instant image presentation when the laser passes through a second SLM [2] Group 2: Performance and Applications - In tests, the new system generated simple images and Van Gogh-style paintings, achieving results comparable to traditional image generators [2] - The energy consumption for generating a Van Gogh-style image was approximately a few millijoules, while traditional diffusion models required hundreds to thousands of joules [2] - The low power characteristics of this system make it particularly suitable for applications in wearable devices, such as AI glasses [2]
腾讯申请图像生成相关专利,可对图像生成的逐步引导和稳健控制
Jin Rong Jie· 2025-08-16 09:19
Core Insights - Tencent Technology (Shenzhen) Co., Ltd. has applied for a patent titled "Image Generation Method, Device, Equipment, Medium, and Product" with publication number CN120495475A, filed on May 2025 [1] - The patent describes a method for generating images based on object input text, which includes processes for denoising random noise images and enhancing text prompts to create target images [1] Company Overview - Tencent Technology (Shenzhen) Co., Ltd. was established in 2000 and is located in Shenzhen, primarily engaged in software and information technology services [1] - The company has a registered capital of 2 million USD and has invested in 15 enterprises, participated in 264 bidding projects, and holds 5000 trademark and patent records, along with 534 administrative licenses [1]
Lumina-mGPT 2.0:自回归模型华丽复兴,媲美顶尖扩散模型
机器之心· 2025-08-12 00:15
Core Viewpoint - Lumina-mGPT 2.0 is an innovative stand-alone autoregressive image model that integrates various tasks such as text-to-image generation, subject-driven generation, and controllable generation, showcasing significant advancements in image generation technology [5][9][21]. Group 1: Core Technology and Breakthroughs - Lumina-mGPT 2.0 employs a fully independent training architecture, utilizing a pure decoder Transformer model, which allows for two parameter versions (2 billion and 7 billion) and avoids biases from pre-trained models [4][5]. - The model incorporates a high-quality image tokenizer, SBER-MoVQGAN, which was selected based on its optimal reconstruction quality on the MS-COCO dataset [7]. - A unified multi-task processing framework is introduced, enabling seamless support for various tasks including text-to-image generation and image editing [9]. Group 2: Efficient Inference Strategies - The model introduces two optimizations to enhance generation speed while maintaining quality, including model quantization to 4-bit integers and a sampling method that reduces GPU memory consumption by 60% [11][13]. - The optimizations allow for parallel decoding, significantly accelerating the generation process [13]. Group 3: Experimental Results - In text-to-image generation benchmarks, Lumina-mGPT 2.0 achieved a GenEval score of 0.80, ranking it among the top generative models, particularly excelling in tests involving "two objects" and "color attributes" [14][15]. - The model demonstrated superior performance in the Graph200K multi-task benchmark, confirming the feasibility of a pure autoregressive model for multi-modal generation tasks [17]. Group 4: Future Directions - Despite optimizations, Lumina-mGPT 2.0 still faces challenges with sampling time, which affects user experience, indicating a need for further enhancements [21]. - The focus will expand from multi-modal generation to include multi-modal understanding, aiming to improve overall functionality and performance [21].
Qwen新开源,把AI生图里的文字SOTA拉爆了
量子位· 2025-08-05 01:40
Core Viewpoint - The article discusses the release of Qwen-Image, a 20 billion parameter image generation model that excels in complex text rendering and image editing capabilities [3][28]. Group 1: Model Features - Qwen-Image is the first foundational image generation model in the Tongyi Qianwen series, utilizing the MMDiT architecture [4][3]. - It demonstrates exceptional performance in complex text rendering, supporting multi-line layouts and fine-grained detail presentation in both English and Chinese [28][32]. - The model also possesses consistent image editing capabilities, allowing for style transfer, modifications, detail enhancement, text editing, and pose adjustments [27][28]. Group 2: Performance Evaluation - Qwen-Image has achieved state-of-the-art (SOTA) performance across various public benchmark tests, including GenEval, DPG, OneIG-Bench for image generation, and GEdit, ImgEdit, GSO for image editing [29][30]. - In particular, it has shown significant superiority in Chinese text rendering compared to existing advanced models [33]. Group 3: Training Strategy - The model employs a progressive training strategy that transitions from non-text to text rendering, gradually moving from simple to complex text inputs, which enhances its native text rendering capabilities [34]. Group 4: Practical Applications - The article includes practical demonstrations of Qwen-Image's capabilities, such as generating illustrations, PPTs, and promotional images, showcasing its ability to accurately integrate text with visuals [11][21][24].
开源!通义千问推出系列中首个图像生成基础模型Qwen-Image
Hua Er Jie Jian Wen· 2025-08-04 21:09
Core Insights - The article discusses the launch of Qwen-Image, a 20 billion parameter MMDiT model, which is the first foundational model for image generation in the Tongyi Qwen series, achieving significant advancements in complex text rendering and precise image editing [1] Group 1 - Qwen-Image is a foundational model specifically designed for image generation [1] - The model has made notable progress in rendering complex text and editing images accurately [1]
训练时间减半,性能不降反升!腾讯混元开源图像生成高效强化方案MixGRPO
量子位· 2025-08-02 08:33
Core Viewpoint - The article introduces MixGRPO, a new framework that combines Stochastic Differential Equations (SDE) and Ordinary Differential Equations (ODE) to enhance the efficiency and performance of image generation processes [1][81]. Group 1: MixGRPO Framework - MixGRPO simplifies the optimization process in Markov Decision Processes (MDP) by utilizing a mixed sampling strategy, which improves both efficiency and performance [1][17]. - The framework shows significant improvements in human preference alignment across multiple dimensions, outperforming DanceGRPO with a training time reduction of nearly 50% [2][60]. - MixGRPO-Flash, a faster variant of MixGRPO, further reduces training time by 71% while maintaining similar performance levels [2][60]. Group 2: Performance Metrics - In comparative studies, MixGRPO achieved a higher Unified Reward score of 3.418, compared to DanceGRPO's 3.397, indicating better alignment with human preferences [60]. - MixGRPO-Flash demonstrated an average iteration time of 112.372 seconds, significantly lower than DanceGRPO's 291.284 seconds [60]. Group 3: Sampling Strategy - The MixGRPO framework employs a hybrid sampling method, where SDE sampling is used within a defined interval during the denoising process, while ODE sampling is applied outside this interval [14][20]. - This approach allows for a reduction in computational overhead and optimization difficulty, while ensuring that the sampling process remains aligned with the marginal distributions of SDE and ODE [30][81]. Group 4: Sliding Window Strategy - A sliding window strategy is introduced to optimize the denoising steps, allowing the model to focus on specific time steps during training [32][35]. - The research team identified key hyperparameters for the sliding window, including window size and movement intervals, which significantly impact performance [34][70]. Group 5: High-Order ODE Solvers - The integration of high-order ODE solvers, such as DPM-Solver++, enhances the sampling speed during the GRPO training process, balancing computational cost and performance [45][76]. - The experiments indicated that a second-order midpoint method was optimal for the high-order solver settings [76]. Group 6: Experimental Validation - The experiments utilized the HPDv2 dataset, which includes diverse prompts, demonstrating that MixGRPO can achieve effective human preference alignment with a limited number of training prompts [49][50]. - The results from various reward models confirmed the robustness of MixGRPO, showing superior performance in both single and multi-reward settings [56][82].
Manus突发上新文生图!告别“抽卡”,Agent+深度思考联合创作
量子位· 2025-05-16 05:36
Core Viewpoint - Manus has announced its new feature that supports image generation, which differs from typical AI drawing tools by understanding the user's intent and planning the generation process before execution [1][18]. Group 1: Image Generation Capabilities - Manus can analyze a room's style based on elements like flooring and walls, creating an analysis report before generating visual designs [5][4]. - The tool can search for furniture on websites like IKEA, select suitable items, and provide links along with visual results [7][3]. - Manus has demonstrated its ability to design a beverage bottle for a tea drink called "TeaVive," focusing on appealing to the youth market by analyzing popular visual elements [11]. Group 2: User Experience and Feedback - Users have praised the integration of intelligent workflows with image generation as a great idea [6]. - Some users have expressed concerns about the pricing of the service, with one user noting that a monthly subscription of $39 only allows for limited usage [26][28]. - The registration process for Manus has been simplified, now offering 1000 points upon registration and daily bonuses [22]. Group 3: Competitive Landscape - The emergence of a competing platform, Lovart, which also focuses on design, has prompted Manus to enhance its offerings [18][20]. - Lovart has gained popularity quickly, similar to Manus's initial launch, indicating a competitive environment in the design AI space [19].
Manus推出图像生成功能
news flash· 2025-05-16 05:21
Core Insights - Manus has launched an image generation feature that not only creates images but also understands user intent and plans solutions [1] Company Overview - Manus is positioned to effectively utilize image generation and other tools to accomplish user tasks [1]
刚刚,Manus生图功能强势登场!从设计到搭建网站一站式搞定,1000积分免费薅
机器之心· 2025-05-16 04:39
Core Viewpoint - Manus has transitioned from a highly sought-after platform requiring invitations to a fully open registration system, marking a significant change in accessibility for users [1]. Group 1: New Features and Offerings - Manus is offering 1,000 points for first-time registrations, encouraging users to explore its features [2]. - The platform has introduced an image generation function that not only creates images but also understands user intent and plans solutions effectively [2]. Group 2: User Experience and Functionality - Users can interact with Manus by sending modification requests or stopping tasks at any time, with notifications provided upon task completion [11]. - The platform successfully deployed a website for a bottled tea brand called CoLe within approximately half an hour, showcasing its capabilities [18]. Group 3: Image Generation Performance - The generated images for CoLe's branding were well-received, featuring a design that aligns with the target demographic of teenagers and conveys a fresh, vibrant aesthetic [9][31]. - The integration of intelligent workflows and the combination of intent understanding with image generation were highlighted as strong points of Manus [32]. Group 4: Areas for Improvement - While image generation is relatively fast, other tasks, such as website creation and deployment, have been reported to take several minutes to over ten minutes, indicating a need for performance enhancements [33].
美的集团(000333):2025年一季报点评:持续拓展全球推动数智驱动
Dongguan Securities· 2025-04-30 09:04
Investment Rating - The report maintains an "Accumulate" rating for the company, indicating an expectation that the stock will outperform the market index by 5%-15% over the next six months [7]. Core Views - The company achieved total revenue of 128.43 billion yuan in Q1 2025, representing a year-on-year growth of 20.61%. The net profit attributable to shareholders was 12.42 billion yuan, up 38.02% year-on-year, and the net profit after deducting non-recurring gains and losses was 12.75 billion yuan, also up 38.03% year-on-year, aligning with expectations [1]. - The company is actively expanding its global production capacity and investing in its own brand development to mitigate trade risks. It operates in over 200 countries, with a low revenue share from the U.S. It has 22 R&D centers and 23 major manufacturing bases across multiple continents [5][6]. - The company is advancing its digital intelligence strategy, focusing on the application of large models and Agent technology. It has developed a language model for the home appliance sector, enhancing user interaction and control across various smart products [5]. Summary by Sections Financial Performance - In Q1 2025, the company's gross margin decreased by 1.87 percentage points to 25.45%, while the net profit margin increased by 1.45 percentage points to 9.97%, primarily due to a reduction in expense ratios [5]. - The company forecasts revenue for 2025 to be 443.97 billion yuan, with net profit expected to reach 43.02 billion yuan, translating to an earnings per share (EPS) of 5.61 yuan, corresponding to a price-to-earnings (PE) ratio of 13 times [6]. Strategic Initiatives - The company is committed to its four strategic pillars: technological leadership, direct user engagement, digital intelligence, and global expansion. It aims to enhance its R&D capabilities and maintain a leading position in the industry [5].