Workflow
流匹配
icon
Search documents
ICCV 2025|降低扩散模型中的时空冗余,上交大EEdit实现免训练图像编辑加速
机器之心· 2025-07-05 02:46
Core Viewpoint - The article discusses the latest research from Professor Zhang Linfeng's team at Shanghai Jiao Tong University, introducing EEdit, a novel framework designed to enhance the efficiency of image editing by addressing spatial and temporal redundancy in diffusion models, achieving a speedup of over 2.4 times compared to previous methods [1][6][8]. Summary by Sections Research Motivation - The authors identified significant spatial and temporal redundancy in image editing tasks using diffusion models, leading to unnecessary computational overhead, particularly in non-editing areas [12][14]. - The study highlights that the inversion process incurs higher time redundancy, suggesting that reducing redundant time steps can significantly accelerate editing tasks [14]. Method Overview - EEdit employs a training-free caching acceleration framework that utilizes output feature reuse to compress the inversion process time steps and control the frequency of area marking updates through region score rewards [15][17]. - The framework is designed to adapt to various input types for editing tasks, including reference images, prompt-based editing, and drag-region guidance [10][15]. Key Features of EEdit - EEdit achieves over 2.4X acceleration in inference speed compared to the unaccelerated version and can reach up to 10X speedup compared to other image editing methods [8][9]. - The framework addresses the computational waste caused by spatial and temporal redundancy, optimizing the editing process without compromising quality [9][10]. - EEdit supports multiple input guidance types, enhancing its versatility in image editing tasks [10]. Experimental Results - The performance of EEdit was evaluated on several benchmarks, demonstrating superior efficiency and quality metrics compared to existing methods [26][27]. - EEdit outperformed other methods in terms of PSNR, LPIPS, SSIM, and CLIP metrics, showcasing its competitive edge in both speed and quality [27][28]. - The spatial locality caching algorithm (SLoC) used in EEdit was found to be more effective than other caching methods, achieving better acceleration and foreground preservation [29].
何恺明CVPR 2025报告深度解读:生成模型如何迈向端到端?
自动驾驶之心· 2025-06-28 13:34
Core Viewpoint - The article discusses the evolution of generative models in deep learning, drawing parallels to the revolutionary changes brought by AlexNet in recognition models, and posits that generative models may be on the brink of a similar breakthrough with the introduction of MeanFlow, which simplifies the generation process from multiple steps to a single step [1][2][35]. Group 1: Evolution of Recognition Models - Prior to AlexNet, layer-wise training was the dominant method for training recognition models, which involved optimizing each layer individually, leading to complex and cumbersome training processes [2][3]. - The introduction of AlexNet in 2012 marked a significant shift to end-to-end training, allowing the entire network to be trained simultaneously, greatly simplifying model design and improving performance [3][7]. Group 2: Current State of Generative Models - Generative models today resemble the pre-AlexNet era of recognition models, relying on multi-step reasoning processes, such as diffusion models and autoregressive models, which raises the question of whether they are in a similar "pre-AlexNet" phase [7][9]. - The article emphasizes the need for generative models to transition from multi-step reasoning to end-to-end generation to achieve a revolutionary breakthrough [7][35]. Group 3: Relationship Between Recognition and Generation - Recognition and generation can be viewed as two sides of the same coin, with recognition being an abstract process that extracts semantic information from data, while generation is a concrete process that transforms abstract representations into realistic data samples [13][15][16]. - The fundamental difference lies in the nature of the mapping: recognition has a deterministic mapping from data to labels, while generation involves a highly nonlinear mapping from noise to complex data distributions, presenting both opportunities and challenges [18][20]. Group 4: Flow Matching and Mean Flows - Flow matching is a key exploration direction for addressing the challenges faced by generative models, aiming to construct a flow field of data distributions to facilitate generation [20][22]. - Mean Flows, a recent method introduced by Kaiming, seeks to achieve one-step generation by replacing complex integral calculations with average velocity computations, significantly enhancing generation efficiency [24][27][29]. - In experiments, Mean Flows demonstrated impressive performance on ImageNet tasks, achieving a FID score of 3.43 with a single function evaluation, outperforming traditional multi-step models [31][32]. Group 5: Future Directions and Challenges - The article outlines several future research directions, including consistency models, two-time-variable models, and revisiting normalizing flows, while questioning whether generative models are still in the "pre-AlexNet" era [33][34]. - Despite the advancements made by Mean Flows, the challenge remains to identify a truly effective formula for end-to-end generative modeling, which is an exciting and open research question [34][35].
ICML 2025 Spotlight | 新理论框架解锁流匹配模型的引导生成
机器之心· 2025-06-28 02:54
Core Viewpoint - The article introduces a novel energy guidance theoretical framework for flow matching models, addressing the gap in energy guidance algorithms within this context and proposing various practical algorithms suitable for different tasks [2][3][27]. Summary by Sections Research Background - Energy guidance is a crucial technique in the application of generative models, ideally altering the distribution of generated samples to align with a specific energy function while maintaining adherence to the training set distribution [7][9]. - Existing energy guidance algorithms primarily focus on diffusion models, which differ fundamentally from flow matching models, necessitating a general energy guidance theoretical framework for flow matching [9]. Method Overview - The authors derive a general flow matching energy guidance vector field from the foundational definitions of flow matching models, leading to the formulation of three categories of practical, training-free energy guidance algorithms [11][12]. - The guidance vector field is designed to direct the original vector field towards regions of lower energy function values [12]. Experimental Results - Experiments were conducted on synthetic data, offline reinforcement learning, and image linear inverse problems, demonstrating the effectiveness of the proposed algorithms [20][22]. - In synthetic datasets, the Monte Carlo sampling-based guidance algorithm achieved results closest to the ground truth distribution, validating the correctness of the flow matching guidance framework [21]. - In offline reinforcement learning tasks, the Monte Carlo sampling guidance exhibited the best performance due to the need for stable guidance samples across different time steps [23]. - For image inverse problems, the Gaussian approximation guidance and GDM showed optimal performance, while the Monte Carlo sampling struggled due to high dimensionality [25]. Conclusion - The work fills a significant gap in energy guidance algorithms for flow matching models, providing a new theoretical framework and several practical algorithms, along with theoretical analysis and experimental comparisons to guide real-world applications [27].
何恺明CVPR最新讲座PPT上线:走向端到端生成建模
机器之心· 2025-06-19 09:30
Core Viewpoint - The article discusses the evolution of generative models, particularly focusing on the transition from diffusion models to end-to-end generative modeling, highlighting the potential for generative models to replicate the historical advancements seen in recognition models [6][36][41]. Group 1: Workshop Insights - The workshop led by Kaiming He at CVPR focused on the evolution of visual generative modeling beyond diffusion models [5][7]. - Diffusion models have become the dominant method in visual generative modeling, but they face limitations such as slow generation speed and challenges in simulating complex distributions [6][36]. - Kaiming He's presentation emphasized the need for end-to-end generative modeling, contrasting it with the historical layer-wise training methods prevalent before AlexNet [10][11][41]. Group 2: Recognition vs. Generation - Recognition and generation can be viewed as two sides of the same coin, where recognition abstracts features from raw data, while generation concretizes abstract representations into detailed data [41][42]. - The article highlights the fundamental differences between recognition tasks, which have a clear mapping from data to labels, and generation tasks, which involve complex, non-linear mappings from simple distributions to intricate data distributions [58]. Group 3: Flow Matching and MeanFlow - Flow Matching is presented as a promising approach to address the challenges in generative modeling by constructing ground-truth fields that are independent of specific neural network architectures [81]. - The MeanFlow framework introduced by Kaiming He aims to achieve single-step generation tasks by modeling average velocity rather than instantaneous velocity, providing a theoretical basis for network training [83][84]. - Experimental results show that MeanFlow significantly outperforms previous single-step diffusion and flow models, achieving a FID score of 3.43, which is over 50% better than the previous best [101][108]. Group 4: Future Directions - The article concludes with a discussion on the ongoing research efforts in the field, including Consistency Models, Two-time-variable Models, and revisiting Normalizing Flows, indicating that the field is still in its early stages akin to the pre-AlexNet era in recognition models [110][113].
何恺明团队又发新作: MeanFlow单步图像生成SOTA,提升达50%
机器之心· 2025-05-21 04:00
Core Viewpoint - The article discusses a new generative modeling framework called MeanFlow, which significantly improves existing flow matching methods by introducing the concept of average velocity, achieving a FID score of 3.43 on the ImageNet 256×256 dataset without the need for pre-training, distillation, or curriculum learning [3][5][7]. Methodology - MeanFlow introduces a new ground-truth field representing average velocity instead of the commonly used instantaneous velocity in flow matching [3][8]. - The average velocity is defined as the displacement over a time interval, and the relationship between average and instantaneous velocity is derived to guide network training [9][10]. Performance Results - MeanFlow demonstrates strong performance in one-step generative modeling, achieving a FID score of 3.43 with only 1-NFE, which is a 50% improvement over the best previous methods [5][16]. - In 2-NFE generation, MeanFlow achieves a FID score of 2.20, comparable to leading multi-step diffusion/flow models [18]. Comparative Analysis - The article provides a comparative analysis of MeanFlow against previous single-step diffusion/flow models, showing that MeanFlow outperforms them significantly, with a FID score of 3.43 compared to 7.77 for IMM [16][17]. - The results indicate that the proposed method effectively narrows the gap between single-step and multi-step diffusion/flow models [18].