扩散模型(Diffusion Models)
Search documents
多模态推理新范式!DiffThinker:用扩散模型「画」出推理和答案
机器之心· 2026-01-07 07:10
在多模态大模型(MLLMs)领域,思维链(CoT)一直被视为提升推理能力的核心技术。然而,面对复杂的长程、视觉中心任务,这种基于文本生成的推 理方式正面临瓶颈: 文本难以精确追踪视觉信息的变化。 形象地说,模型不知道自己想到哪一步了,对应图像是什么状态。 | 9 | 1 8 5 3 7 4 2 6 | 5 | 3 2 8 4 1 7 9 | ୧ | | | | | | | | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | 7 2 9 1 6 5 3 8 | 4 | 3 7 4 9 1 8 6 5 | 2 | | | | | | | | | | 8 6 4 3 2 5 | न | 9 | 7 | 5 | 7 6 8 2 4 | ਜ | l d | | | | | | | | | | | | | | | | | | | 2 | 3 6 8 | 7 | ਹ ਦ | 4 | d | 2 3 | 1 | 4 6 8 | 7 | 5 | 9 | | 3 8 1 6 4 9 7 5 2 | | | | | | | | ...
近500页史上最全扩散模型修炼宝典,一书覆盖三大主流视角
具身智能之心· 2025-10-30 00:03
Core Insights - The article discusses the comprehensive guide on diffusion models, which have significantly reshaped the landscape of generative AI across various domains such as images, audio, video, and 3D environments [3][5][6] - It emphasizes the need for a structured understanding of diffusion models, as researchers often struggle to piece together concepts from numerous papers [4][10] Summary by Sections Introduction to Diffusion Models - Diffusion models are framed as a gradual transformation process over time, contrasting with traditional generative models that directly learn mappings from noise to data [12] - The development of diffusion models is explored through three main perspectives: variational methods, score-based methods, and flow-based methods, which provide complementary frameworks for understanding and implementing diffusion modeling [12][13] Fundamental Principles of Diffusion Models - The origins of diffusion models are traced back, linking them to foundational perspectives such as Variational Autoencoders (VAE), score-based methods, and normalizing flows [14][15] - The chapter illustrates how these methods can be unified under a continuous time framework, highlighting their mathematical equivalence [17] Core Perspectives on Diffusion Models - The article outlines the core perspectives on diffusion models, including the forward process of adding noise and the reverse process of denoising [22] - Each perspective is detailed: - Variational view focuses on learning denoising processes through variational objectives [23] - Score-based view emphasizes learning score functions to guide denoising [23] - Flow-based view describes the generation process as a continuous transformation from a simple prior distribution to the data distribution [23][24] Sampling from Diffusion Models - The sampling process in diffusion models is characterized by a unique refinement from coarse to fine details, which presents a trade-off between performance and efficiency [27][28] - Techniques for improving sampling efficiency and quality are discussed, including classifier guidance and numerical solvers [29] Learning Fast Generative Models - The article explores methods for directly learning fast generative models that approximate the diffusion process, enhancing speed and scalability [30] - Distillation-based methods are highlighted, where a student model mimics a slower teacher model to achieve faster sampling [30][31] Conclusion - The book aims to establish a lasting theoretical framework for diffusion models, focusing on continuous time dynamical systems that connect simple prior distributions to data distributions [33] - It emphasizes the importance of understanding the underlying principles and connections between different methods to design and improve next-generation generative models [36]