后生可畏,何恺明团队新成果发布,共一清华姚班大二在读
3 6 Ke·2025-12-04 02:21

Core Insights - The article discusses the introduction of Improved MeanFlow (iMF), an enhanced version of the original MeanFlow (MF), which addresses key issues related to training stability, guidance flexibility, and architectural efficiency [1][4]. Model Performance - iMF significantly improves model performance by reformulating the training objective to a more stable instantaneous velocity loss and introducing flexible classifier-free guidance (CFG) [2][12]. - In the ImageNet 256x256 benchmark, the iMF-XL/2 model achieved a FID score of 1.72 in 1-NFE (single-step function evaluation), representing a 50% improvement over the original MF [2][18]. Model Configuration and Efficiency - The configurations of both MF and iMF models are detailed, showing a reduction in parameters and improved performance metrics for iMF models compared to MF models [3][19]. - For instance, the iMF-B/2 model has 89 million parameters and a FID score of 3.39, while the MF-B/2 model has 131 million parameters and a FID score of 6.17 [3][19]. Training Methodology - iMF's core improvement lies in reconstructing the prediction function, transforming the training process into a standard regression problem, which enhances optimization stability [4][11]. - The training loss is now based on instantaneous velocity, allowing for a more stable and standard regression training process [10][11]. Guidance Flexibility - iMF introduces a flexible classifier-free guidance mechanism, allowing the guidance scale to be learned as a condition, thus enhancing the model's adaptability during inference [12][14]. - This flexibility enables the model to learn average velocity fields under varying guidance strengths, unlocking CFG's full potential [12]. Contextual Conditioning - The iMF architecture employs an efficient in-context conditioning mechanism, replacing the large adaLN-zero module with multiple learnable tokens for various conditions, improving efficiency and reducing parameter count [15][17]. - This adjustment allows iMF to handle multiple heterogeneous conditions more effectively, leading to a significant reduction in model size and increased design flexibility [17]. Experimental Results - iMF demonstrates exceptional performance on challenging benchmarks, with the iMF-XL/2 model achieving a FID of 1.72 in 1-NFE, showcasing its superiority over many pre-trained multi-step models [18][20]. - In 2-NFE evaluations, iMF further narrows the performance gap between single-step and multi-step diffusion models, achieving a FID of 1.54 [20].