何恺明CVPR 2025报告深度解读：生成模型如何迈向端到端？

Core Viewpoint - The article discusses the evolution of generative models in deep learning, drawing parallels to the revolutionary changes brought by AlexNet in recognition models, and posits that generative models may be on the brink of a similar breakthrough with the introduction of MeanFlow, which simplifies the generation process from multiple steps to a single step [1][2][35]. Group 1: Evolution of Recognition Models - Prior to AlexNet, layer-wise training was the dominant method for training recognition models, which involved optimizing each layer individually, leading to complex and cumbersome training processes [2][3]. - The introduction of AlexNet in 2012 marked a significant shift to end-to-end training, allowing the entire network to be trained simultaneously, greatly simplifying model design and improving performance [3][7]. Group 2: Current State of Generative Models - Generative models today resemble the pre-AlexNet era of recognition models, relying on multi-step reasoning processes, such as diffusion models and autoregressive models, which raises the question of whether they are in a similar "pre-AlexNet" phase [7][9]. - The article emphasizes the need for generative models to transition from multi-step reasoning to end-to-end generation to achieve a revolutionary breakthrough [7][35]. Group 3: Relationship Between Recognition and Generation - Recognition and generation can be viewed as two sides of the same coin, with recognition being an abstract process that extracts semantic information from data, while generation is a concrete process that transforms abstract representations into realistic data samples [13][15][16]. - The fundamental difference lies in the nature of the mapping: recognition has a deterministic mapping from data to labels, while generation involves a highly nonlinear mapping from noise to complex data distributions, presenting both opportunities and challenges [18][20]. Group 4: Flow Matching and Mean Flows - Flow matching is a key exploration direction for addressing the challenges faced by generative models, aiming to construct a flow field of data distributions to facilitate generation [20][22]. - Mean Flows, a recent method introduced by Kaiming, seeks to achieve one-step generation by replacing complex integral calculations with average velocity computations, significantly enhancing generation efficiency [24][27][29]. - In experiments, Mean Flows demonstrated impressive performance on ImageNet tasks, achieving a FID score of 3.43 with a single function evaluation, outperforming traditional multi-step models [31][32]. Group 5: Future Directions and Challenges - The article outlines several future research directions, including consistency models, two-time-variable models, and revisiting normalizing flows, while questioning whether generative models are still in the "pre-AlexNet" era [33][34]. - Despite the advancements made by Mean Flows, the challenge remains to identify a truly effective formula for end-to-end generative modeling, which is an exciting and open research question [34][35].