Workflow
可能是目前效果最好的开源生图模型,混元生图3.0来了
TENCENTTENCENT(HK:00700) 量子位·2025-09-30 12:22

Core Viewpoint - Tencent has released and open-sourced HunyuanImage 3.0, the largest open-source native multimodal image generation model with 80 billion parameters, which integrates understanding and generation capabilities, rivaling leading closed-source models in the industry [1][20]. Model Features - HunyuanImage 3.0 supports multi-resolution image generation and exhibits strong instruction adherence, world knowledge reasoning, and text rendering capabilities, producing aesthetically pleasing and artistic outputs [1][11]. - The model inherits world knowledge reasoning from Hunyuan-A13B, allowing it to solve complex tasks such as generating detailed steps for solving equations [4][5]. - It can handle intricate prompts, such as visualizing sorting algorithms with specific styles and providing pseudocode, showcasing its advanced text rendering abilities [7][11]. Technical Architecture - The model is based on Hunyuan-A13B, utilizing a native multimodal and unified autoregressive framework that deeply integrates text understanding, visual understanding, and high-fidelity image generation [17][19]. - Unlike traditional approaches, HunyuanImage 3.0 employs a dual-encoder structure and incorporates generalized causal attention to enhance both language reasoning and global image modeling [22][25]. - The training process includes a three-stage filtering of over 10 billion images to select nearly 5 billion high-quality, diverse images, ensuring the removal of low-quality data [32]. Training Strategy - The training begins with a progressive four-stage pre-training process, gradually increasing image resolution and complexity, culminating in a fine-tuning phase focused on specific text-to-image generation tasks [36][38]. - The model employs a multi-stage post-training strategy that includes human preference data to refine the generated outputs [38]. Evaluation Metrics - HunyuanImage 3.0's performance is assessed using both automated metrics (SSAE) and human evaluations (GSB), demonstrating competitive results against leading models in the industry [40][46]. - The model achieved a 14.10% higher win rate compared to its predecessor, HunyuanImage 2.1, indicating significant improvements in performance [46].