Workflow
生成建模
icon
Search documents
何恺明组三位本科生领衔!持续聚焦Flow模型,突破归一化流生成效率瓶颈
量子位· 2025-12-15 04:04
鱼羊 发自 凹非寺 量子位 | 公众号 QbitAI 何恺明团队新作,持续聚焦Flow模型。 论文提出名为 双向归一化流 (BiFlow) 的新框架,通过解耦前向过程——将数据映射为噪声,和逆向过程——把噪声再转回来生成图片, 成功打破了传统归一化流生成模型效率低下的问题。 值得一提的是,论文的三位一作分别是来自清华姚班和MIT的本科生。 BiFlow:逆向过程不必是前向过程的精确逆运算 归一化流方法 (NFs) 已经成为生成建模的一种原则性框架。 标准的归一化流包含前向过程和逆向过程: 与MeanFlow对流匹配的优化不同,这次主要旨在解决归一化流在生成模型中的局限。 前向过程将数据映射为噪声,逆向过程则通过对前向过程求逆来生成样本。 传统的NF模型有一个硬性规定,逆向过程必须是前向过程的精确逆运算——要像钥匙和锁一样完全匹配。这就导致了两个问题: BiFlow的核心创新就在于, 打破了"逆向过程必须是前向过程的精确逆运算"这一规则 。 设计思路是这样的: BiFLow解耦了前向过程和逆向过程的设计。 模型设计受限:因为要保证 "可逆",不能使用很多强大的通用架构 (比如视觉Transformer) ,得特 ...
李飞飞团队25年研究大盘点:从视觉理解到具身智能的全景图谱
自动驾驶之心· 2025-11-07 00:05
Core Insights - The research team led by Professor Fei-Fei Li at Stanford University has made significant advancements in artificial intelligence, focusing on human-centered AI and its applications in various domains [2][3][19]. - The team's work emphasizes a holistic approach to AI, integrating perception, modeling, reasoning, and decision-making to create intelligent systems that can understand and reconstruct the world [3][19]. Research Achievements in 2025 - The team has achieved notable results in generative modeling, developing a framework that enhances the transfer of knowledge from 2D to 3D environments, showcasing improved generalization and scalability [3][19]. - In the area of embodied intelligence, the team has successfully integrated affordance learning and action constraints to enable robots to generalize across different tasks and environments [3][19]. - The research on semantic reasoning and human-machine understanding has strengthened model consistency in dynamic environments, enhancing the alignment between visual and language inputs [3][19]. - The team has actively contributed to AI governance and social responsibility, advocating for policy assessments and safety frameworks in cutting-edge AI technologies [3][19]. Specific Research Contributions - The MOMAGEN framework addresses the challenge of efficiently generating demonstration data for multi-step robotic tasks, significantly improving data diversity and generalization capabilities with minimal real data [5][7]. - The Spatial Mental Modeling study introduces a new benchmark, MINDCUBE, to evaluate visual language models' ability to construct spatial mental models from limited views, revealing the importance of internal spatial structure representation [9][10]. - The UAD framework allows for unsupervised extraction of affordance knowledge from large-scale models, enhancing robotic manipulation capabilities in open environments without manual labeling [10][12]. - The Grafting method enables efficient exploration of diffusion transformer designs without the need for retraining, achieving high-quality generation with minimal computational resources [12][14]. - The NeuHMR framework improves 3D human motion reconstruction by utilizing neural rendering, enhancing robustness and accuracy in complex scenarios [14][16]. - The BEHAVIOR ROBOT SUITE provides a comprehensive platform for real-world robotic manipulation tasks, demonstrating capabilities in dual-arm coordination and precise navigation [16][18]. - The MOMA-QA dataset and SGVLM model advance video question answering by emphasizing fine-grained temporal and spatial reasoning, significantly outperforming existing methods [18][19]. - The Gaussian Atlas framework facilitates the transfer of knowledge from 2D diffusion models to 3D generation tasks, bridging the gap between these two domains [18][19]. Keywords for 2025 - Cognition, Generation, Embodiment, Transfer, Explainability [20]