Workflow
Stable Diffusion 3
icon
Search documents
可微奖励就该直接微调!用HJB方程颠覆流匹配对齐|NeurIPS'25
量子位· 2026-03-09 06:05
VGG-Flow团队 投稿 量子位 | 公众号 QbitAI 用强化学习微调扩散模型,还有更好的办法吗? 来自 港中深 、 微软研究院 等机构的 VGG-Flow团队 给出了一个新思路:既然奖励函数本身是可微的,为什么非要绕弯路用PPO、 GRPO。 在大规模生成模型的对齐任务中,通常依赖强化学习,在某个奖励函数上微调模型以贴近人类偏好。而事实上,大部分奖励模型本身是在偏 好数据集上训练过的神经网络。既然奖励是可微的,能否直接利用"可微性"本身,高效而稳定地微调流匹配模型? 主流做法主要分为两类路径:一条路是把模型当作黑盒,通过像Flow-GRPO那样,把原本确定性的ODE采样过程强行转为随机SDE,适配 经典的强化学习框架来采用高方差的策略梯度方法 (如PPO、GRPO) 。 另一条路则更加直接,如ReFL等方法,通过近似手段优化某些取样步对应的奖励值,但这种做法在目标层面上缺乏严格的理论支撑,也往 往容易导致过拟合与模式坍塌。那么是否可以走一条新路线? VGG-Flow 团队 回归第一性原理,将奖励微调重新表述为一个 连续时间最优控制问题 。通过Hamilton–Jacobi–Bellman(HJB)方程 ...
40倍推理加速!复旦&微软:用「非线性流」拟合复杂轨迹,2步生成媲美原画
量子位· 2026-02-15 03:45
Core Insights - The article introduces ArcFlow, a novel image generation acceleration framework developed by Fudan University and Microsoft Research Asia, which addresses the long inference time and high computational costs associated with diffusion models by employing a non-linear flow mechanism instead of traditional linear simplification strategies [2][9]. Group 1: ArcFlow Innovations - ArcFlow achieves significant improvements, requiring only 2 steps (2 NFE) while maintaining high image quality comparable to the teacher model, resulting in approximately 40 times faster inference and 4 times faster training convergence [3][14]. - The method requires fine-tuning of less than 5% of the parameters, making it resource-efficient and quick to converge [3][15]. Group 2: Challenges in Existing Methods - Existing distillation methods assume a linear shortcut between noise and the final image, leading to geometric mismatch and poor image quality due to the complex, curved trajectories of teacher models [5][6]. - Traditional methods often require 40 to 100 steps for denoising, making real-time applications challenging and resulting in quality degradation when attempting to reduce steps [5][6]. Group 3: ArcFlow's Mechanisms - ArcFlow introduces momentum parameterization to capture the continuity of speed, eliminating sampling redundancy by modeling the speed field as a mixture of continuous momentum processes [11]. - The framework derives a closed-form analytical solution based on momentum equations, allowing for precise trajectory integration and high-accuracy flow matching [12]. - ArcFlow's trajectory distillation strategy preserves the non-linear characteristics of the teacher model, aligning instantaneous speeds without disrupting the pre-trained weight distribution, thus enhancing training efficiency [13]. Group 4: Experimental Results - ArcFlow has been validated on large-scale models like Qwen-Image-20B and FLUX.1-dev, demonstrating superior image quality and semantic consistency in benchmark tests compared to existing state-of-the-art methods [15][19]. - The results indicate that ArcFlow generates clearer images with rich details and diversity, avoiding issues like background blurriness and structural distortion seen in linear distillation methods [19]. Group 5: Conclusion - ArcFlow represents a significant advancement in knowledge distillation for image generation, effectively leveraging the prior knowledge of pre-trained teacher models while ensuring faster convergence and higher quality outputs [22].
直观理解Flow Matching生成式算法
自动驾驶之心· 2025-12-17 00:03
Core Viewpoint - The article discusses the Flow Matching algorithm, a generative model that simplifies the process of generating samples similar to a target dataset without complex mathematical concepts or derivations [3][4][12]. Algorithm Principle - Flow Matching is a generative model that aims to generate samples close to a given target set without requiring input [3][4]. - The algorithm learns a direction of movement from a source point to a target point, effectively guiding the generation process [14][16]. Training and Inference - During training, the model samples points along the line from source to target and averages the slopes from multiple connections to determine the direction of movement [17]. - In inference, the model starts from a noise point and iteratively moves towards the target, collapsing into a specific state as it approaches the target [17][18]. Code Implementation - The code provided demonstrates a simple implementation of the Flow Matching algorithm, including the generation of random input points and the prediction of slopes using a neural network [18][19]. - The model uses a vector field to predict the direction and speed of movement towards the target distribution [19][20]. Advanced Applications - The article mentions the adaptation of Flow Matching for conditional generation tasks, allowing for the generation of samples based on specific prompts or conditions [24][30]. - An example is given of generating handwritten digits from the MNIST dataset using Flow Matching, showcasing its versatility in different generative tasks [30][32]. Conclusion - Flow Matching presents a more efficient alternative to diffusion models in generative tasks, with applications in various fields including image generation and automated driving [12][43].
直观理解Flow Matching生成式算法
自动驾驶之心· 2025-11-28 00:49
Algorithm Overview - Flow Matching is a generative model that aims to generate samples similar to a given target set without any input [3][4] - The model learns a direction of movement from a source point to a target point, effectively generating new samples by iteratively adjusting the position towards the target [14][17] Training and Inference - During training, the model samples points along the line connecting source and target, learning the average slope from multiple connections [16][17] - In inference, the model starts from a noise point and moves towards the target, gradually collapsing to a specific state as it approaches the target [17][18] Code Implementation - The implementation involves generating random inputs, predicting the slope using a neural network, and applying an optimization process to minimize the loss between predicted and target slopes [18][19] - The code includes hyperparameters for dimensions, sample sizes, and training epochs, demonstrating a straightforward approach to implementing the Flow Matching algorithm [19][25] Advanced Applications - The model can be adapted to generate samples based on prompts, allowing for more controlled generation by segmenting the target distribution [24][29] - A more complex example includes generating handwritten digits from the MNIST dataset, showcasing the model's versatility in handling different types of data [30][32] Model Architecture - The architecture includes a UNet backbone for predicting the velocity field, which enhances performance through multi-scale feature fusion [32][34] - The model incorporates conditional inputs to refine the generation process, ensuring that the output aligns with specified conditions [34][35] Training Process - The training loop involves generating dynamic noise, calculating the loss based on the difference between predicted and actual images, and updating the model parameters accordingly [40][41] - The model is designed to visualize generated samples periodically, providing insights into its performance and output quality [40][41]
慕尼黑工业大学等基于SD3开发卫星图像生成方法,构建当前最大规模遥感数据集
3 6 Ke· 2025-06-30 07:47
Core Insights - A new method for generating satellite imagery using geographic climate prompts and Stable Diffusion 3 (SD3) has been proposed by teams from the Technical University of Munich and ETH Zurich, resulting in the creation of the largest and most comprehensive remote sensing dataset, EcoMapper [1][2][4]. Dataset Overview - EcoMapper consists of over 2.9 million RGB satellite images collected from 104,424 global locations, covering 15 land cover types and corresponding climate records [2][5]. - The dataset includes a training set with 98,930 geographic points, each observed over a 24-month period, and a test set with 5,494 geographic points observed over 96 months [5][6]. Methodology - The research developed a text-image generation model based on fine-tuned SD3, which utilizes climate and land cover details to generate realistic synthetic images [4][8]. - A multi-condition model framework using ControlNet was also developed to map climate data or generate time series, simulating landscape evolution [4][12]. Model Performance - The study evaluated the performance of SD3 and DiffusionSat models in generating climate-aware satellite images, with metrics indicating significant improvements over baseline models [14][19]. - The SD3-FT-HR model achieved the lowest Fréchet Inception Distance (FID) score of 49.48, indicating high realism in generated images [15][16]. Climate Sensitivity Analysis - The generated vegetation density was found to be significantly correlated with climate changes, with performance varying under extreme weather conditions [16][18]. Applications and Future Directions - EcoMapper provides a framework for simulating satellite images based on climate variables, offering new opportunities for visualizing climate change impacts and enhancing integration of satellite and climate data for downstream models [22][26].