MeanFlow
Search documents
何恺明带大二本科生颠覆扩散图像生成:扔掉多步采样和潜空间,一步像素直出
量子位· 2026-02-02 05:58
Core Viewpoint - The article discusses the introduction of a new method called Pixel Mean Flow (pMF), which simplifies the architecture of diffusion models by eliminating traditional components like multi-step sampling and latent space, allowing for direct image generation in pixel space [2][3][5]. Group 1: Methodology and Innovations - pMF achieves significant performance improvements, with a FID score of 2.22 at a resolution of 256×256 and 2.48 at 512×512, marking it as one of the best single-step, non-latent space diffusion models [4][27]. - The elimination of multi-step sampling and latent space reduces the complexity of the generation process, allowing for a more efficient architecture [6][36]. - The core design of pMF involves the network directly outputting pixel-level denoised images while using a velocity field to compute loss during training [13][25]. Group 2: Experimental Results - In experiments, the pMF model outperformed the previous method EPG, which had a FID of 8.82, demonstrating a substantial improvement in image generation quality [27]. - The addition of perceptual loss during training led to a reduction in FID from 9.56 to 3.53, showcasing the effectiveness of this approach [26]. - The computational efficiency of pMF is highlighted, as it requires significantly less computational power compared to GAN methods like StyleGAN-XL, which demands 1574 Gflops for each forward pass, while pMF-H/16 only requires 271 Gflops [27]. Group 3: Challenges and Future Directions - The integration of single-step and pixel space models presents increased challenges in architecture design, necessitating advanced solutions to handle the complexities involved [10][12]. - The article emphasizes that as model capabilities improve, the historical compromises of multi-step sampling and latent space encoding are becoming less necessary, encouraging further exploration of direct, end-to-end generative modeling [36].
何恺明CVPR最新讲座PPT上线:走向端到端生成建模
机器之心· 2025-06-19 09:30
Core Viewpoint - The article discusses the evolution of generative models, particularly focusing on the transition from diffusion models to end-to-end generative modeling, highlighting the potential for generative models to replicate the historical advancements seen in recognition models [6][36][41]. Group 1: Workshop Insights - The workshop led by Kaiming He at CVPR focused on the evolution of visual generative modeling beyond diffusion models [5][7]. - Diffusion models have become the dominant method in visual generative modeling, but they face limitations such as slow generation speed and challenges in simulating complex distributions [6][36]. - Kaiming He's presentation emphasized the need for end-to-end generative modeling, contrasting it with the historical layer-wise training methods prevalent before AlexNet [10][11][41]. Group 2: Recognition vs. Generation - Recognition and generation can be viewed as two sides of the same coin, where recognition abstracts features from raw data, while generation concretizes abstract representations into detailed data [41][42]. - The article highlights the fundamental differences between recognition tasks, which have a clear mapping from data to labels, and generation tasks, which involve complex, non-linear mappings from simple distributions to intricate data distributions [58]. Group 3: Flow Matching and MeanFlow - Flow Matching is presented as a promising approach to address the challenges in generative modeling by constructing ground-truth fields that are independent of specific neural network architectures [81]. - The MeanFlow framework introduced by Kaiming He aims to achieve single-step generation tasks by modeling average velocity rather than instantaneous velocity, providing a theoretical basis for network training [83][84]. - Experimental results show that MeanFlow significantly outperforms previous single-step diffusion and flow models, achieving a FID score of 3.43, which is over 50% better than the previous best [101][108]. Group 4: Future Directions - The article concludes with a discussion on the ongoing research efforts in the field, including Consistency Models, Two-time-variable Models, and revisiting Normalizing Flows, indicating that the field is still in its early stages akin to the pre-AlexNet era in recognition models [110][113].
何恺明团队又发新作: MeanFlow单步图像生成SOTA,提升达50%
机器之心· 2025-05-21 04:00
Core Viewpoint - The article discusses a new generative modeling framework called MeanFlow, which significantly improves existing flow matching methods by introducing the concept of average velocity, achieving a FID score of 3.43 on the ImageNet 256×256 dataset without the need for pre-training, distillation, or curriculum learning [3][5][7]. Methodology - MeanFlow introduces a new ground-truth field representing average velocity instead of the commonly used instantaneous velocity in flow matching [3][8]. - The average velocity is defined as the displacement over a time interval, and the relationship between average and instantaneous velocity is derived to guide network training [9][10]. Performance Results - MeanFlow demonstrates strong performance in one-step generative modeling, achieving a FID score of 3.43 with only 1-NFE, which is a 50% improvement over the best previous methods [5][16]. - In 2-NFE generation, MeanFlow achieves a FID score of 2.20, comparable to leading multi-step diffusion/flow models [18]. Comparative Analysis - The article provides a comparative analysis of MeanFlow against previous single-step diffusion/flow models, showing that MeanFlow outperforms them significantly, with a FID score of 3.43 compared to 7.77 for IMM [16][17]. - The results indicate that the proposed method effectively narrows the gap between single-step and multi-step diffusion/flow models [18].