Workflow
Pixel Mean Flow(pMF)
icon
Search documents
何恺明带大二本科生颠覆扩散图像生成:扔掉多步采样和潜空间,一步像素直出
量子位· 2026-02-02 05:58
Core Viewpoint - The article discusses the introduction of a new method called Pixel Mean Flow (pMF), which simplifies the architecture of diffusion models by eliminating traditional components like multi-step sampling and latent space, allowing for direct image generation in pixel space [2][3][5]. Group 1: Methodology and Innovations - pMF achieves significant performance improvements, with a FID score of 2.22 at a resolution of 256×256 and 2.48 at 512×512, marking it as one of the best single-step, non-latent space diffusion models [4][27]. - The elimination of multi-step sampling and latent space reduces the complexity of the generation process, allowing for a more efficient architecture [6][36]. - The core design of pMF involves the network directly outputting pixel-level denoised images while using a velocity field to compute loss during training [13][25]. Group 2: Experimental Results - In experiments, the pMF model outperformed the previous method EPG, which had a FID of 8.82, demonstrating a substantial improvement in image generation quality [27]. - The addition of perceptual loss during training led to a reduction in FID from 9.56 to 3.53, showcasing the effectiveness of this approach [26]. - The computational efficiency of pMF is highlighted, as it requires significantly less computational power compared to GAN methods like StyleGAN-XL, which demands 1574 Gflops for each forward pass, while pMF-H/16 only requires 271 Gflops [27]. Group 3: Challenges and Future Directions - The integration of single-step and pixel space models presents increased challenges in architecture design, necessitating advanced solutions to handle the complexities involved [10][12]. - The article emphasizes that as model capabilities improve, the historical compromises of multi-step sampling and latent space encoding are becoming less necessary, encouraging further exploration of direct, end-to-end generative modeling [36].