Workflow
MeanFlow
icon
Search documents
Diffusion Model扩散模型一文尽览!
自动驾驶之心· 2025-09-13 16:04
Core Viewpoint - The article discusses the mathematical principles behind diffusion models, emphasizing the importance of noise in the sampling process and how it contributes to generating diverse and realistic images. The key takeaway is that diffusion models leverage Langevin sampling to transition from one probability distribution to another, with noise being an essential component rather than a mere side effect [10][11][26]. Summary by Sections Section 1: Basic Concepts of Diffusion Models - The article introduces the foundational concepts related to diffusion models, focusing on the use of velocity vector fields to define ordinary differential equations (ODEs) and the mathematical representation of these fields through trajectories [4]. Section 2: Langevin Sampling - Langevin sampling is highlighted as a crucial method for approximating transitions between distributions. The process involves adding noise to the sampling, which allows for exploration of the probability space and prevents convergence to local maxima [10][11][14][26]. Section 3: Role of Noise - Noise is described as a necessary component in the diffusion process, enabling the model to generate diverse samples rather than converging to peak values. The article explains that without noise, the sampling process would only yield local maxima, limiting the diversity of generated outputs [26][28][31]. Section 4: Comparison with GANs - The article contrasts diffusion models with Generative Adversarial Networks (GANs), noting that diffusion models assign the task of diversity to noise, which alleviates issues like mode collapse that can occur in GANs [37]. Section 5: Training and Implementation - The training process for diffusion models involves using score matching and kernel density estimation (KDE) to learn the underlying data distribution. The article outlines the steps for training, including the generation of noisy samples and the calculation of gradients for optimization [64][65]. Section 6: Flow Matching Techniques - Flow matching is introduced as a method for optimizing the sampling process, with a focus on minimizing the distance between the learned velocity field and the true data distribution. The article discusses the equivalence of flow matching and optimal transport strategies [76][86]. Section 7: Mean Flow and Rectified Flow - Mean flow and rectified flow are presented as advanced techniques within the flow matching framework, emphasizing their ability to improve sampling efficiency and stability during the generation process [100][106].
MeanFlow再下一城,北大提出机器人学习新范式MP1,实现速度与成功率双SOTA
机器之心· 2025-07-24 09:33
Core Viewpoint - The article discusses the introduction of a new robotic learning framework called MP1, which significantly improves the efficiency and effectiveness of robotic manipulation tasks by addressing challenges in action generation and few-shot generalization [4][11]. Group 1: Action Generation Models - The current VLA model's action generation quality and speed are determined by the action generation model, which faces a fundamental trade-off between inference speed and task success rate [2]. - Diffusion Models generate high-quality action sequences through multi-step iterations but are slow, while Flow-based models offer faster inference at the cost of added complexity and potential performance limitations [2][3]. Group 2: MP1 Framework - MP1 introduces the MeanFlow paradigm from image generation to robotic learning, achieving millisecond-level inference speed and laying the foundation for VLA action generation models [4]. - The core innovation of MP1 is the shift in its generation paradigm, learning an interval-averaged velocity field instead of an instantaneous velocity field, which eliminates the need for time-consuming iterative solving [8]. Group 3: Few-Shot Generalization - MP1 addresses the challenge of feature collapse in robotic learning by introducing Dispersive Loss, which optimizes the internal representation space of the policy network, enhancing its ability to distinguish between different states [11][12]. - This loss function operates only during training, significantly improving the model's ability to learn from a minimal number of demonstrations, which is crucial in high-cost data collection environments [12]. Group 4: Performance Testing - MP1 has been validated through simulations across 37 complex tasks, demonstrating superior task success rates and stability compared to existing models [15][16]. - The average success rate of MP1 reached 78.9%, outperforming FlowPolicy and DP3 by 7.3% and 10.2%, respectively, particularly excelling in more difficult tasks [17]. Group 5: Inference Efficiency - MP1 achieved an average inference time of only 6.8 ms on an NVIDIA RTX 4090 GPU, making it nearly twice as fast as FlowPolicy and 19 times faster than DP3, thus meeting the real-time control frequency requirements in robotics [18][19]. Group 6: Few-Shot Learning Validation - Experiments confirmed that MP1 consistently outperformed FlowPolicy across all data scales, especially in extreme few-shot scenarios with only 2-5 demonstrations, validating the effectiveness of Dispersive Loss in enhancing generalization [21]. Group 7: Real-World Validation - In real-world tests, MP1 achieved the highest success rates and shortest task completion times across five tasks, with a success rate of 90% in the "Hummer" task, significantly surpassing FlowPolicy and DP3 [23].