Workflow
流匹配
icon
Search documents
智元发布一体化具身大小脑系统GenieReasoner
人民财讯1月1日电,智元具身研究中心推出第二代一体化具身大小脑系统GenieReasoner。针对VLA模 型中语义推理与动作控制的模态对齐难题,智元具身研究中心提出了一种支持统一离散化预训练的模型 架构,并通过流匹配(Flow-matching)缓解了传统离散Tokenizer的动作精度瓶颈。 ...
56倍加速生成式策略:EfficientFlow,迈向高效具身智能
具身智能之心· 2025-12-17 00:05
点击下方 卡片 ,关注" 具身智能之心 "公众号 >> 点击进入→ 具身 智能之心 技术交流群 更多干货,欢迎加入国内首个具身智能全栈学习社区: 具身智能之心知识星球(戳我) ,这里包含所有你想要的! 本文共同第一作者为西安交通大学硕士生常建磊和博士生梅若风。柯炜为西安交通大学副教授。论文通讯作者为西安交通大学教授许翔宇,其研究方向涵盖三维 视觉、生成式 AI 与具身智能(个人主页:https://xuxy09.github.io/)。 生成式模型正在成为机器人和具身智能领域的重要范式,它能够从高维视觉观测中直接生成复杂、灵活的动作策略,在操作、抓取等任务中表现亮眼。但在真实 系统中,这类方法仍面临两大「硬伤」: 一是训练极度依赖大规模演示数据,二是推理阶段需要大量迭代,动作生成太慢,难以实时控制。 针对这一核心瓶颈,西安交通大学研究团队提出了全新的生成式策略学习方法 EfficientFlow 。该方法通过将 等变建模与高效流匹配(Flow Matching)深度融合 , 在显著提升数据效率的同时,大幅压缩推理所需的迭代步数 ,在多个机器人操作基准上实现了 SOTA 的性能,并将推理速度提升一个数量级以上。 ...
NeurIPS 2025 | CMU、清华、UTAustin开源ReinFlow,用在线RL微调机器人流匹配策略
机器之心· 2025-10-20 09:15
Core Insights - The article discusses the emergence of ReinFlow, an online reinforcement learning framework designed to fine-tune flow matching policies, which has been accepted at NeurIPS 2025 and is open-sourced with comprehensive documentation [2][5][27]. Group 1: ReinFlow Overview - ReinFlow is a general framework applicable to all strategies defined by ordinary differential equations, such as Rectified Flow and Shortcut Models, and supports inference with minimal steps [12]. - The framework significantly reduces training time by over 60% compared to DPPO while maintaining similar performance levels [14][16]. Group 2: Algorithm Characteristics - ReinFlow utilizes a strategy gradient theory to convert deterministic flows into discrete-time Markov processes, optimizing the entire flow matching chain [5][7]. - The algorithm introduces a small amount of learnable noise into the deterministic path of the flow strategy, allowing for a stochastic diffusion process that enhances exploration while controlling deviation from the pre-trained strategy [8][10]. Group 3: Performance Metrics - In D4RL locomotion tasks, ReinFlow fine-tuned Rectified Flow strategies achieved an average net performance increase of 135.36%, while reducing the wall-clock time for fine-tuning by 82.63% [16]. - For long-range operation tasks, ReinFlow fine-tuned Shortcut Model strategies improved success rates by an average of 40.34% with fewer diffusion steps, saving an average of 23.20% in training time [18]. Group 4: Experimental Validation - The research team conducted ablation studies to assess the impact of various factors on training outcomes, demonstrating that reinforcement learning fine-tuning can further enhance performance beyond mere data augmentation [24]. - The framework has been validated across multiple benchmark tasks, showing significant performance improvements compared to pre-trained models [14]. Group 5: Open Source and Future Directions - ReinFlow's GitHub project is fully open-sourced and actively maintained, providing a complete codebase, model checkpoints, and detailed documentation for community engagement [27]. - Future updates will include support for various flow models, classic RL environments, and comprehensive guides for installation and usage [29].
ICCV 2025|降低扩散模型中的时空冗余,上交大EEdit实现免训练图像编辑加速
机器之心· 2025-07-05 02:46
Core Viewpoint - The article discusses the latest research from Professor Zhang Linfeng's team at Shanghai Jiao Tong University, introducing EEdit, a novel framework designed to enhance the efficiency of image editing by addressing spatial and temporal redundancy in diffusion models, achieving a speedup of over 2.4 times compared to previous methods [1][6][8]. Summary by Sections Research Motivation - The authors identified significant spatial and temporal redundancy in image editing tasks using diffusion models, leading to unnecessary computational overhead, particularly in non-editing areas [12][14]. - The study highlights that the inversion process incurs higher time redundancy, suggesting that reducing redundant time steps can significantly accelerate editing tasks [14]. Method Overview - EEdit employs a training-free caching acceleration framework that utilizes output feature reuse to compress the inversion process time steps and control the frequency of area marking updates through region score rewards [15][17]. - The framework is designed to adapt to various input types for editing tasks, including reference images, prompt-based editing, and drag-region guidance [10][15]. Key Features of EEdit - EEdit achieves over 2.4X acceleration in inference speed compared to the unaccelerated version and can reach up to 10X speedup compared to other image editing methods [8][9]. - The framework addresses the computational waste caused by spatial and temporal redundancy, optimizing the editing process without compromising quality [9][10]. - EEdit supports multiple input guidance types, enhancing its versatility in image editing tasks [10]. Experimental Results - The performance of EEdit was evaluated on several benchmarks, demonstrating superior efficiency and quality metrics compared to existing methods [26][27]. - EEdit outperformed other methods in terms of PSNR, LPIPS, SSIM, and CLIP metrics, showcasing its competitive edge in both speed and quality [27][28]. - The spatial locality caching algorithm (SLoC) used in EEdit was found to be more effective than other caching methods, achieving better acceleration and foreground preservation [29].
何恺明CVPR 2025报告深度解读:生成模型如何迈向端到端?
自动驾驶之心· 2025-06-28 13:34
Core Viewpoint - The article discusses the evolution of generative models in deep learning, drawing parallels to the revolutionary changes brought by AlexNet in recognition models, and posits that generative models may be on the brink of a similar breakthrough with the introduction of MeanFlow, which simplifies the generation process from multiple steps to a single step [1][2][35]. Group 1: Evolution of Recognition Models - Prior to AlexNet, layer-wise training was the dominant method for training recognition models, which involved optimizing each layer individually, leading to complex and cumbersome training processes [2][3]. - The introduction of AlexNet in 2012 marked a significant shift to end-to-end training, allowing the entire network to be trained simultaneously, greatly simplifying model design and improving performance [3][7]. Group 2: Current State of Generative Models - Generative models today resemble the pre-AlexNet era of recognition models, relying on multi-step reasoning processes, such as diffusion models and autoregressive models, which raises the question of whether they are in a similar "pre-AlexNet" phase [7][9]. - The article emphasizes the need for generative models to transition from multi-step reasoning to end-to-end generation to achieve a revolutionary breakthrough [7][35]. Group 3: Relationship Between Recognition and Generation - Recognition and generation can be viewed as two sides of the same coin, with recognition being an abstract process that extracts semantic information from data, while generation is a concrete process that transforms abstract representations into realistic data samples [13][15][16]. - The fundamental difference lies in the nature of the mapping: recognition has a deterministic mapping from data to labels, while generation involves a highly nonlinear mapping from noise to complex data distributions, presenting both opportunities and challenges [18][20]. Group 4: Flow Matching and Mean Flows - Flow matching is a key exploration direction for addressing the challenges faced by generative models, aiming to construct a flow field of data distributions to facilitate generation [20][22]. - Mean Flows, a recent method introduced by Kaiming, seeks to achieve one-step generation by replacing complex integral calculations with average velocity computations, significantly enhancing generation efficiency [24][27][29]. - In experiments, Mean Flows demonstrated impressive performance on ImageNet tasks, achieving a FID score of 3.43 with a single function evaluation, outperforming traditional multi-step models [31][32]. Group 5: Future Directions and Challenges - The article outlines several future research directions, including consistency models, two-time-variable models, and revisiting normalizing flows, while questioning whether generative models are still in the "pre-AlexNet" era [33][34]. - Despite the advancements made by Mean Flows, the challenge remains to identify a truly effective formula for end-to-end generative modeling, which is an exciting and open research question [34][35].
ICML 2025 Spotlight | 新理论框架解锁流匹配模型的引导生成
机器之心· 2025-06-28 02:54
Core Viewpoint - The article introduces a novel energy guidance theoretical framework for flow matching models, addressing the gap in energy guidance algorithms within this context and proposing various practical algorithms suitable for different tasks [2][3][27]. Summary by Sections Research Background - Energy guidance is a crucial technique in the application of generative models, ideally altering the distribution of generated samples to align with a specific energy function while maintaining adherence to the training set distribution [7][9]. - Existing energy guidance algorithms primarily focus on diffusion models, which differ fundamentally from flow matching models, necessitating a general energy guidance theoretical framework for flow matching [9]. Method Overview - The authors derive a general flow matching energy guidance vector field from the foundational definitions of flow matching models, leading to the formulation of three categories of practical, training-free energy guidance algorithms [11][12]. - The guidance vector field is designed to direct the original vector field towards regions of lower energy function values [12]. Experimental Results - Experiments were conducted on synthetic data, offline reinforcement learning, and image linear inverse problems, demonstrating the effectiveness of the proposed algorithms [20][22]. - In synthetic datasets, the Monte Carlo sampling-based guidance algorithm achieved results closest to the ground truth distribution, validating the correctness of the flow matching guidance framework [21]. - In offline reinforcement learning tasks, the Monte Carlo sampling guidance exhibited the best performance due to the need for stable guidance samples across different time steps [23]. - For image inverse problems, the Gaussian approximation guidance and GDM showed optimal performance, while the Monte Carlo sampling struggled due to high dimensionality [25]. Conclusion - The work fills a significant gap in energy guidance algorithms for flow matching models, providing a new theoretical framework and several practical algorithms, along with theoretical analysis and experimental comparisons to guide real-world applications [27].
何恺明CVPR最新讲座PPT上线:走向端到端生成建模
机器之心· 2025-06-19 09:30
Core Viewpoint - The article discusses the evolution of generative models, particularly focusing on the transition from diffusion models to end-to-end generative modeling, highlighting the potential for generative models to replicate the historical advancements seen in recognition models [6][36][41]. Group 1: Workshop Insights - The workshop led by Kaiming He at CVPR focused on the evolution of visual generative modeling beyond diffusion models [5][7]. - Diffusion models have become the dominant method in visual generative modeling, but they face limitations such as slow generation speed and challenges in simulating complex distributions [6][36]. - Kaiming He's presentation emphasized the need for end-to-end generative modeling, contrasting it with the historical layer-wise training methods prevalent before AlexNet [10][11][41]. Group 2: Recognition vs. Generation - Recognition and generation can be viewed as two sides of the same coin, where recognition abstracts features from raw data, while generation concretizes abstract representations into detailed data [41][42]. - The article highlights the fundamental differences between recognition tasks, which have a clear mapping from data to labels, and generation tasks, which involve complex, non-linear mappings from simple distributions to intricate data distributions [58]. Group 3: Flow Matching and MeanFlow - Flow Matching is presented as a promising approach to address the challenges in generative modeling by constructing ground-truth fields that are independent of specific neural network architectures [81]. - The MeanFlow framework introduced by Kaiming He aims to achieve single-step generation tasks by modeling average velocity rather than instantaneous velocity, providing a theoretical basis for network training [83][84]. - Experimental results show that MeanFlow significantly outperforms previous single-step diffusion and flow models, achieving a FID score of 3.43, which is over 50% better than the previous best [101][108]. Group 4: Future Directions - The article concludes with a discussion on the ongoing research efforts in the field, including Consistency Models, Two-time-variable Models, and revisiting Normalizing Flows, indicating that the field is still in its early stages akin to the pre-AlexNet era in recognition models [110][113].
何恺明团队又发新作: MeanFlow单步图像生成SOTA,提升达50%
机器之心· 2025-05-21 04:00
Core Viewpoint - The article discusses a new generative modeling framework called MeanFlow, which significantly improves existing flow matching methods by introducing the concept of average velocity, achieving a FID score of 3.43 on the ImageNet 256×256 dataset without the need for pre-training, distillation, or curriculum learning [3][5][7]. Methodology - MeanFlow introduces a new ground-truth field representing average velocity instead of the commonly used instantaneous velocity in flow matching [3][8]. - The average velocity is defined as the displacement over a time interval, and the relationship between average and instantaneous velocity is derived to guide network training [9][10]. Performance Results - MeanFlow demonstrates strong performance in one-step generative modeling, achieving a FID score of 3.43 with only 1-NFE, which is a 50% improvement over the best previous methods [5][16]. - In 2-NFE generation, MeanFlow achieves a FID score of 2.20, comparable to leading multi-step diffusion/flow models [18]. Comparative Analysis - The article provides a comparative analysis of MeanFlow against previous single-step diffusion/flow models, showing that MeanFlow outperforms them significantly, with a FID score of 3.43 compared to 7.77 for IMM [16][17]. - The results indicate that the proposed method effectively narrows the gap between single-step and multi-step diffusion/flow models [18].