Workflow
Flow Matching
icon
Search documents
Diffusion Model扩散模型一文尽览!
自动驾驶之心· 2025-09-13 16:04
作者 | 论文推土机 编辑 | 自动驾驶之心 原文链接: https://zhuanlan.zhihu.com/p/1948137034842611877 点击下方 卡片 ,关注" 自动驾驶之心 "公众号 戳我-> 领取 自动驾驶近30个 方向 学习 路线 >>自动驾驶前沿信息获取 → 自动驾驶之心知识星球 本文只做学术分享,如有侵权,联系删文 Diffusion 扩散模型整理 本文整理diffusion的数学原理,只有少量微分方程,随机微分方程和概率相关的公式,所以只需要基础数学背景也能完全看懂。如果实在看不懂也没关系,关注高亮块的 结论即可。快速阅读办法就是只看高亮块然后接受结论即可。更快速的阅读办法是只看第一章朗之万采样建立对diffusion的直观印象即可。总结来说:相对论就是和美女 在一起时间短,加班的时候时间长;diffusion就是用网络学习怎么解常微分/随机微分方程。 本文分成五部分内容: 首先我们整理与diffusion model相关的各个基础概念,这部分的整理都是数学定义,主要来自以下链接: [An Introduction to Flow Matching and Diffusion ...
从方法范式和应用场景上看强化与VLA/Flow Matching/机器人控制算法
具身智能之心· 2025-08-19 01:54
Core Viewpoint - The article discusses recent advancements in reinforcement learning (RL) and its applications in robotics, particularly focusing on the VLA (Vision-Language Action) models and diffusion policies, highlighting their potential to handle complex tasks that traditional RL struggles with [2][4][35]. Method Paradigms - Traditional RL and imitation learning combined with Sim2Real techniques are foundational approaches in robotics [3]. - VLA models differ fundamentally from traditional RL by using training data distributions to describe task processes and goals, allowing for the execution of more complex tasks [4][35]. - Diffusion Policy is a novel approach that utilizes diffusion models to generate continuous action sequences, demonstrating superior capabilities in complex task execution compared to traditional RL methods [4][5]. Application Scenarios - The article categorizes applications into two main types: basic motion control for humanoid and quadruped robots, and complex/long-range operational tasks [22][23]. - Basic motion control primarily relies on RL and Sim2Real, with current implementations still facing challenges in achieving fluid motion akin to human or animal movements [22]. - For complex tasks, architectures typically involve a pre-trained Vision Transformer (ViT) encoder and a large language model (LLM), utilizing diffusion or flow matching for action output [23][25]. Challenges and Future Directions - The article identifies key challenges in the field, including the need for better simulation environments, effective domain randomization, and the integration of external goal conditions [35]. - It emphasizes the importance of human intention in task definition and the limitations of current models in learning complex tasks without extensive human demonstration data [35][40]. - Future advancements may involve multi-modal input predictions for task goals and the potential integration of brain-machine interfaces to enhance human-robot interaction [35].
AI生图大洗牌!流匹配架构颠覆传统,一个模型同时接受文本和图像输入
量子位· 2025-05-30 05:01
Core Viewpoint - The article discusses the breakthrough of the new AI model FLUX.1 Kontext, which utilizes flow matching architecture to accept both text and image inputs, enabling advanced context generation and editing capabilities [2][3]. Group 1: Model Features - FLUX.1 Kontext offers two versions: the professional version for rapid iteration and the high-end version that improves adherence to prompts and consistency [7]. - The model has four key features: character consistency across scenes, localized editing, style reference for new scene generation, and minimal latency for interaction [11]. Group 2: Performance Comparison - Third-party platform Replicate conducted tests showing FLUX.1 Kontext outperforms OpenAI's 4o model in quality and cost-effectiveness, with better color accuracy [12]. Group 3: Editing Techniques - For image editing, maintaining character identity is crucial regardless of the size of changes made [15]. - Complex changes, such as adding characters or altering backgrounds, should be described in multiple steps for optimal results [18]. - Style transfer tasks benefit from specific art styles or artist references to achieve better outcomes [19]. Group 4: Text Editing Capabilities - The model supports adding, deleting, and modifying text on images, with specific guidelines for maintaining readability and layout [22][25]. - Clear instructions on which elements to retain are essential for effective text editing [25]. Group 5: User Guidance - Detailed and specific descriptions yield better results in editing tasks, emphasizing the importance of clarity in instructions [20][37]. - The article provides a summary of effective prompt techniques for using FLUX.1 Kontext, highlighting the need for precise language and structured editing steps [34][37].
Z Tech|对话CV泰斗何恺明新作研究团队,三位05后MIT本科生,Diffusion真的需要噪声条件吗?
Z Potentials· 2025-02-27 04:09
Core Viewpoint - The recent research led by renowned scholar He Kaiming and three MIT freshmen challenges the traditional understanding of noise conditioning in denoising models, suggesting that it may not be essential for model performance [1][3]. Group 1: Research Findings - The study demonstrates that removing noise conditioning from many mainstream denoising models results in only a modest degradation in performance [4]. - The newly designed unconditional model, uEDM, achieves a near-state-of-the-art FID score of 2.23 in the CIFAR-10 benchmark, only slightly behind the top noise-conditioned model, EDM, which has an FID score of 1.97 [2][6]. - The research provides a theoretical framework and experimental results that validate the stability of mainstream denoising models when noise conditioning is removed, indicating the non-necessity of traditional noise conditioning techniques in practical applications [3][5]. Group 2: Implications and Future Directions - The findings open avenues for reducing model computational complexity and inspire new model designs that do not rely on noise conditioning [3]. - The upcoming live lecture will feature discussions on generative models and potential development directions, including a Q&A session with the authors [2].