Workflow
Flow Matching
icon
Search documents
从方法范式和应用场景上看强化与VLA/Flow Matching/机器人控制算法
具身智能之心· 2025-08-19 01:54
作者丨 Jeungtao 因为之前做过两年强化学习相关的工作,看到最近VLA也有一些突破,出于个人兴趣,业余时间学习了一下,也做了点笔记,分享到这里。主要是RL的Context 下这两年VLA/Diffusion Policy。最适合有一些RL背景,想了解一些新的进展的朋友阅读。如有疏漏敬请指出。分别从 方法范式 与 应用场景 两个维度展开。 一、方法范式 1. 传统强化学习(RL)/模仿学习 + Sim2Real 2. Diffusion Policy、Flow Matching与VLA模型 个人觉得VLA系列与传统RL一个根本性的区别是RL任务目标输入方式靠reward function,难以描述复杂的任务的过程和目标(比如何折衣服符合人类的喜好,折 好后放到哪里"reward"更大,如何收拾桌面符合人类"整洁"的定义,这些都是比较模糊的,更进一步地,一些长程任务如餐后收拾桌面和洗餐具、洗衣服烘干和 晾晒等,则更难用reward shapping规则化地描述); 编辑丨具身智能之心 原文链接: https://zhuanlan.zhihu.com/p/1940101671704327220 点击下方 卡片 ...
AI生图大洗牌!流匹配架构颠覆传统,一个模型同时接受文本和图像输入
量子位· 2025-05-30 05:01
Core Viewpoint - The article discusses the breakthrough of the new AI model FLUX.1 Kontext, which utilizes flow matching architecture to accept both text and image inputs, enabling advanced context generation and editing capabilities [2][3]. Group 1: Model Features - FLUX.1 Kontext offers two versions: the professional version for rapid iteration and the high-end version that improves adherence to prompts and consistency [7]. - The model has four key features: character consistency across scenes, localized editing, style reference for new scene generation, and minimal latency for interaction [11]. Group 2: Performance Comparison - Third-party platform Replicate conducted tests showing FLUX.1 Kontext outperforms OpenAI's 4o model in quality and cost-effectiveness, with better color accuracy [12]. Group 3: Editing Techniques - For image editing, maintaining character identity is crucial regardless of the size of changes made [15]. - Complex changes, such as adding characters or altering backgrounds, should be described in multiple steps for optimal results [18]. - Style transfer tasks benefit from specific art styles or artist references to achieve better outcomes [19]. Group 4: Text Editing Capabilities - The model supports adding, deleting, and modifying text on images, with specific guidelines for maintaining readability and layout [22][25]. - Clear instructions on which elements to retain are essential for effective text editing [25]. Group 5: User Guidance - Detailed and specific descriptions yield better results in editing tasks, emphasizing the importance of clarity in instructions [20][37]. - The article provides a summary of effective prompt techniques for using FLUX.1 Kontext, highlighting the need for precise language and structured editing steps [34][37].
Z Tech|对话CV泰斗何恺明新作研究团队,三位05后MIT本科生,Diffusion真的需要噪声条件吗?
Z Potentials· 2025-02-27 04:09
Core Viewpoint - The recent research led by renowned scholar He Kaiming and three MIT freshmen challenges the traditional understanding of noise conditioning in denoising models, suggesting that it may not be essential for model performance [1][3]. Group 1: Research Findings - The study demonstrates that removing noise conditioning from many mainstream denoising models results in only a modest degradation in performance [4]. - The newly designed unconditional model, uEDM, achieves a near-state-of-the-art FID score of 2.23 in the CIFAR-10 benchmark, only slightly behind the top noise-conditioned model, EDM, which has an FID score of 1.97 [2][6]. - The research provides a theoretical framework and experimental results that validate the stability of mainstream denoising models when noise conditioning is removed, indicating the non-necessity of traditional noise conditioning techniques in practical applications [3][5]. Group 2: Implications and Future Directions - The findings open avenues for reducing model computational complexity and inspire new model designs that do not rely on noise conditioning [3]. - The upcoming live lecture will feature discussions on generative models and potential development directions, including a Q&A session with the authors [2].