扩散模型 - filings, earnings calls, financial reports, news - Reportify

扩散模型

Search documents

模型「漂移」新范式，何恺明新作让生成模型无须迭代推理

机器之心· 2026-02-08 10:37

Core Viewpoint - The article introduces the "Drifting Model," a novel generative modeling paradigm that eliminates the need for iterative inference processes, thereby enhancing efficiency in generating high-quality outputs [3][7][26]. Group 1: Generative Modeling Techniques - Traditional generative models, such as diffusion models, rely on iterative processes and differential equations to map distributions, making them time-consuming and resource-intensive [1][2]. - Variational Autoencoders (VAEs) and Normalizing Flows (NFs) are also discussed as methods that attempt to streamline the generation process, but they still face challenges related to iterative training [2][3]. Group 2: Drifting Model Characteristics - The Drifting Model utilizes a pushforward mapping that evolves during training, allowing for single-step inference without the iterative nature of previous models [7][8]. - A drifting field is introduced to control the movement of samples, ensuring that the generated distribution aligns with the target data distribution [8][10]. Group 3: Experimental Results - The Drifting Model achieved a state-of-the-art (SOTA) FID score of 1.54 on ImageNet 256×256 under standard latent space generation protocols, demonstrating competitive performance even against multi-step diffusion models [14][24]. - In challenging pixel space generation protocols, the model reached an FID of 1.61, significantly outperforming previous pixel space methods [14][26]. Group 4: Robustness and Efficiency - The model exhibits robustness against mode collapse, maintaining the ability to approximate multi-modal target distributions effectively [16][17]. - The research highlights the importance of robust feature representations in generative modeling, indicating that advancements in self-supervised learning can directly benefit this paradigm [26]. Group 5: Implications for Future Research - The findings suggest that the principles of distribution evolution through drifting fields could be broadly applicable across various generative tasks, opening new avenues for efficient generative modeling research [26].

判别类模型

自监督学习

漂移模型（Drifting Model）

变分自编码器（VAE）

判别类模型

自监督学习

漂移模型（Drifting Model）

变分自编码器（VAE）

何恺明团队新作：扩散模型可能被用错了

3 6 Ke· 2025-11-19 11:22

Core Insights - The latest paper challenges the mainstream approach of diffusion models by suggesting that instead of predicting noise, models should directly generate clean images [1][2] - The research emphasizes a return to the fundamental purpose of diffusion models, which is denoising, rather than complicating the architecture with additional components [2][3] Summary by Sections Diffusion Models Misuse - Current mainstream diffusion models often predict noise or a mixture of images and noise, rather than focusing on generating clean images [3][5] - This approach creates a significant challenge, as predicting noise requires a large model capacity to capture the high-dimensional noise, leading to potential training failures [5][6] Proposed Solution: JiT Architecture - The paper introduces a simplified architecture called JiT (Just image Transformers), which directly predicts clean images without relying on complex components like VAE or tokenizers [7][8] - JiT operates purely from pixel data, treating the task as a denoising problem, which aligns better with the original design of neural networks [6][8] Experimental Results - Experimental results indicate that while traditional noise prediction models struggle in high-dimensional spaces, JiT maintains robustness and achieves superior performance [10] - JiT demonstrates excellent scalability, maintaining high-quality generation even with larger input dimensions without increasing network width [11][13] - The architecture achieved state-of-the-art FID scores of 1.82 and 1.78 on ImageNet datasets of 256x256 and 512x512, respectively [13][14] Author Background - The lead author, Li Tianhong, is a notable researcher in representation learning and generative models, having previously collaborated with renowned researcher He Kaiming [15][17]

Artificial Intelligence

JiT（Just image Transformers）

Artificial Intelligence

JiT（Just image Transformers）

何恺明团队新作：扩散模型可能被用错了

量子位· 2025-11-19 09:01

Core Viewpoint - The article discusses a new paper by He Kaiming that challenges the mainstream approach to diffusion models by advocating for a return to the original purpose of denoising, suggesting that models should directly predict clean images instead of noise [2][5][6]. Summary by Sections Diffusion Models - Diffusion models have become increasingly complex over the years, often focusing on predicting noise rather than the clean images they were originally designed to denoise [4][6]. - The new paper emphasizes that since diffusion models are fundamentally denoising models, they should directly perform denoising [5][6]. Manifold Hypothesis - The article explains the manifold hypothesis, stating that natural images exist on a low-dimensional manifold within a high-dimensional pixel space, while noise is uniformly distributed across the entire space [7][9]. - This distinction leads to challenges when neural networks attempt to fit high-dimensional noise, requiring significant model capacity and often resulting in training failures [9]. JiT Architecture - The proposed architecture, JiT (Just image Transformers), is a simplified model that processes images directly without relying on complex components like VAE or tokenizers [10][11]. - JiT operates by taking raw pixel data, dividing it into large patches, and setting the output target to predict clean image blocks [12]. Experimental Results - Experimental results indicate that while predicting noise and predicting original images perform similarly in low-dimensional spaces, traditional noise prediction models fail in high-dimensional spaces, while JiT remains robust [14]. - JiT demonstrates excellent scalability, maintaining high-quality generation even when input dimensions are significantly increased [15][17]. - The JiT architecture achieved state-of-the-art FID scores of 1.82 and 1.78 on ImageNet datasets of 256x256 and 512x512, respectively, without relying on complex components or pre-training [18][19]. Research Focus - The primary research direction of He Kaiming includes representation learning, generative models, and their synergistic effects, aiming to build intelligent visual systems that understand the world beyond human perception [21].

JiT（Just image Transformers）

JiT（Just image Transformers）

OpenAI宋飏被Meta挖跑了，扩散模型崛起关键人物，加入MSL再会师清华校友赵晟佳

3 6 Ke· 2025-09-26 03:19

Core Insights - Meta has successfully recruited Yang Song, a prominent researcher from OpenAI, which has caused significant surprise within the industry [1][6][8]. Group 1: Yang Song's Background and Achievements - Yang Song is recognized as a key contributor to the rise of diffusion models and has made significant advancements in addressing their limitations [9][13]. - He graduated from Tsinghua University at the age of 16 and later earned his PhD from Stanford University, where he was mentored by a notable professor [20][22]. - During his time at OpenAI, he led the Strategic Explorations Team and was instrumental in developing the Consistency Models, which outperform diffusion models in speed and performance [10][11][13]. Group 2: Impact of Recruitment on Meta - The recruitment of Yang Song is part of Meta's broader strategy to attract top talent from leading AI research organizations, indicating a focus on enhancing their capabilities in AI and machine learning [6][8]. - Industry insiders believe that the motivations for such moves are not solely financial, as many of the recruited individuals have already achieved significant wealth [8]. - Yang Song's transition to Meta is seen as a strategic advantage for the company, potentially positioning them to lead in the development of next-generation AI models [6][24].

Meta Platforms(US:META)

Artificial Intelligence

一致性模型

Artificial Intelligence

一致性模型

OpenAI宋飏被Meta挖跑了！扩散模型崛起关键人物，加入MSL再会师清华校友赵晟佳

量子位· 2025-09-25 13:00

Core Viewpoint - Meta has successfully recruited Yang Song, a prominent researcher from OpenAI, which has raised significant interest in the AI research community due to his notable contributions to diffusion models and generative modeling [1][6][7]. Group 1: Yang Song's Background and Achievements - Yang Song is recognized as a key contributor to the rise of diffusion models and has been a leading figure in OpenAI's Strategic Explorations Team [10][11]. - He graduated from Tsinghua University at the age of 16 and later earned his PhD from Stanford University, where he worked under the guidance of a notable professor [20][36]. - His most famous work includes the development of Consistency Models, which outperform diffusion models in speed and performance, generating images significantly faster [12][14][17]. Group 2: Impact of Yang Song's Work - The Consistency Models developed by Yang Song can generate 64 images of 256×256 pixels in approximately 3.5 seconds, showcasing a substantial improvement over existing models [12][14]. - His research has led to the creation of Continuous-Time Consistency Models, which address stability and scalability issues in earlier models, achieving a training scale of 1.5 billion parameters [15][18]. - The advancements made by Yang Song and his team are considered potential game-changers in the generative modeling field, with discussions suggesting they could "end" the dominance of diffusion models [18][19]. Group 3: Meta's Strategic Recruitment - Meta's recruitment of Yang Song is part of a broader strategy to enhance its AI capabilities by attracting top talent from leading organizations like OpenAI [9][10]. - The move is seen as a significant loss for OpenAI, with many colleagues expressing surprise at his departure [7][6]. - The motivations behind such moves are speculated to extend beyond financial incentives, as many researchers prioritize impactful work and collaboration opportunities [9].

Meta Platforms(US:META)

Artificial Intelligence

一致性模型

Artificial Intelligence

一致性模型

都在聊轨迹预测，到底如何与自动驾驶结合？

自动驾驶之心· 2025-08-16 00:03

Core Viewpoint - The article emphasizes the significant role of diffusion models in enhancing the capabilities of autonomous driving systems, particularly in data diversity, perception robustness, and decision-making under uncertainty [2][3]. Group 1: Applications of Diffusion Models - Diffusion models improve 3D occupancy prediction, outperforming traditional methods, especially in occluded or low-visibility areas, thus aiding downstream planning tasks [5]. - Conditional diffusion models are utilized for precise image translation in driving scenarios, enhancing system understanding of various road environments [5]. - Stable diffusion models efficiently predict vehicle trajectories, significantly boosting the predictive capabilities of autonomous driving systems [5]. - The DiffusionDrive framework innovatively applies diffusion models to multimodal action distribution, addressing uncertainties in driving decisions [5]. Group 2: Data Generation and Quality - Diffusion models effectively tackle the challenges of insufficient diversity and authenticity in natural driving datasets, providing high-quality synthetic data for autonomous driving validation [5]. - Future explorations will include video generation to further enhance data quality, particularly in 3D data annotation [5]. Group 3: Recent Research Developments - The dual-conditioned temporal diffusion model (DcTDM) generates realistic long-duration driving videos, outperforming existing models by over 25% in consistency and frame quality [7]. - LD-Scene integrates large language models with latent diffusion models for user-controllable adversarial scenario generation, achieving state-of-the-art performance in generating high adversariality and diversity [11]. - DualDiff enhances multi-view driving scene generation through a dual-branch conditional diffusion model, achieving state-of-the-art performance in various downstream tasks [14][34]. Group 4: Traffic Simulation and Scenario Generation - DriveGen introduces a novel traffic simulation framework that generates diverse traffic scenarios, supporting customized designs and improving downstream algorithm performance [26]. - Scenario Dreamer utilizes a vectorized latent diffusion model for generating driving simulation environments, demonstrating superior performance in realism and efficiency [28][31]. - AdvDiffuser generates adversarial safety-critical driving scenarios, enhancing transferability across different systems while maintaining high realism and diversity [68]. Group 5: Safety and Robustness - AVD2 enhances understanding of accident scenarios through the generation of accident videos aligned with natural language descriptions, significantly advancing accident analysis and prevention [39]. - Causal Composition Diffusion Model (CCDiff) improves the generation of closed-loop traffic scenarios by incorporating causal structures, demonstrating enhanced realism and user preference alignment [44].

自动驾驶系统

自动驾驶系统

一文尽览！扩散模型在自动驾驶基础模型中的应用汇总，30+工作都在这里了~

自动驾驶之心· 2025-07-31 23:33

Core Insights - The article discusses the significant role of diffusion models in the development of autonomous driving technologies, highlighting their ability to enhance data diversity, improve perception system robustness, and assist decision-making under uncertainty [2][3]. Group 1: Diffusion Models in Autonomous Driving - Diffusion models have shown promising applications in autonomous driving, particularly in generating diverse and physically constrained results from complex data distributions [2]. - The introduction of the Dual-Conditioned Temporal Diffusion Model (DcTDM) allows for the generation of realistic long-duration driving videos, addressing challenges such as limited data quality and high costs [3][4]. - The performance of DcTDM has been evaluated, demonstrating over 25% improvement in consistency and frame quality compared to other video diffusion models [3]. Group 2: Applications in Perception and Decision-Making - In perception, diffusion models significantly outperform traditional methods in 3D occupancy prediction, especially in occluded or low-visibility areas, thereby supporting downstream planning tasks [4]. - The Stable Diffusion Model effectively predicts vehicle trajectories, enhancing the predictive capabilities of autonomous driving systems [4]. - The DiffusionDrive framework utilizes diffusion models to model multimodal action distributions, innovating end-to-end autonomous driving applications by addressing uncertainties in driving decisions [4]. Group 3: Data Generation and Quality Improvement - Diffusion models are crucial for generating high-quality synthetic data, addressing the challenges of insufficient diversity and authenticity in natural driving datasets [4]. - The introduction of controllable generation techniques is particularly important for overcoming 3D data annotation challenges, with future explorations into video generation aimed at further enhancing data quality [4]. Group 4: Advanced Frameworks and Innovations - LD-Scene combines large language models with latent diffusion models to generate adversarial driving scenarios, enhancing the controllability and robustness of generated scenes [9]. - DualDiff introduces a dual-branch diffusion model designed to improve multi-view driving scene generation, utilizing occupancy ray sampling for rich semantic information [30]. - DiVE employs a diffusion transformer framework to generate high-fidelity, temporally coherent multi-view videos, achieving state-of-the-art performance in multi-view video generation [19][20]. Group 5: Safety and Critical Scenario Generation - AVD2 enhances understanding of accident scenarios by generating videos aligned with detailed natural language descriptions, contributing to accident analysis and prevention [36]. - AdvDiffuser generates adversarial safety-critical driving scenarios, improving transferability across different systems while maintaining authenticity and diversity [68][69]. - The introduction of Causal Composition Diffusion Model (CCDiff) enhances controllability and realism in generating closed-loop traffic scenarios, significantly outperforming existing methods [41].

研一结束了，还什么都不太懂。。。

自动驾驶之心· 2025-07-24 06:46

Core Viewpoint - The article emphasizes the evolving landscape of the autonomous driving industry, highlighting the need for professionals to adapt their skill sets to align with current industry demands, particularly in areas like end-to-end VLA (Vision-Language Action) models and traditional control systems [4][6]. Summary by Sections Industry Trends - The demand for talent in autonomous driving is shifting towards candidates with strong backgrounds and skills in cutting-edge technologies, such as end-to-end VLA models, while traditional control systems still have job opportunities [2][4]. - The article notes that the technology stack in autonomous driving is becoming more standardized, reducing the diversity of recruitment directions compared to previous years [3][4]. Skill Development - Professionals are encouraged to upgrade their technical skills to meet the evolving demands of the industry, with a focus on continuous learning and adaptation [4][6]. - The article suggests that anxiety about job prospects can be mitigated by actively seeking out learning resources and engaging with communities that focus on the latest advancements in autonomous driving technology [4][6]. Learning Resources - The article mentions various learning modules available in the "Autonomous Driving Heart Knowledge Planet," which includes cutting-edge topics such as world models, trajectory prediction, and large models [5][11]. - It highlights the availability of videos and materials for beginners and advanced learners, aimed at helping individuals navigate the complexities of the autonomous driving field [4][5]. Community Engagement - The "Autonomous Driving Heart Knowledge Planet" is described as a significant community for knowledge sharing, featuring nearly 4000 members and over 100 industry experts, providing a platform for discussion and problem-solving [8][11]. - The community focuses on various subfields within autonomous driving, including perception, mapping, planning, and control, offering a comprehensive approach to learning and professional development [11][13].

自动驾驶技术

端到端自动驾驶

视觉大语言模型（VLM）

自动驾驶技术

端到端自动驾驶

视觉大语言模型（VLM）

ASIC，大救星！

半导体行业观察· 2025-07-20 04:06

Group 1 - The article highlights a growing "computational crisis" driven by the increasing demand for artificial intelligence (AI), characterized by unsustainable energy consumption, high training costs, and limitations of traditional semiconductor technologies [1][2][3]. - The energy consumption of data centers supporting AI operations is projected to rise from approximately 200 terawatt-hours (TWh) in 2023 to 260 TWh by 2026, accounting for about 6% of total electricity demand in the U.S. [3][5]. - The costs associated with training cutting-edge AI models are expected to exceed $1 billion by 2027, indicating a significant supply-demand gap in computational resources [3][5]. Group 2 - The article introduces "physics-based application-specific integrated circuits (ASICs)" as a transformative approach that leverages inherent physical dynamics for computation, aiming to improve energy efficiency and computational throughput [1][6]. - Traditional ASIC designs impose constraints such as statelessness, unidirectionality, determinism, and synchronization, which limit their efficiency. In contrast, physics-based ASICs are designed to utilize or tolerate statefulness, bidirectionality, non-determinism, and asynchrony [9][12][14]. - The performance advantages of physics-based ASICs stem from their ability to relax traditional design constraints, potentially leading to significant energy savings and enhanced computational capabilities [20][21]. Group 3 - The design of physics-based ASICs involves a principled strategy that intersects top-down and bottom-up approaches, focusing on maximizing the overlap between algorithms suitable for specific applications and those that can efficiently run on particular physical structures [22][24]. - Performance metrics for evaluating the efficiency of algorithms on hardware include runtime and energy consumption, with specific ratios defined to assess the effectiveness of algorithms on physics-based ASICs compared to state-of-the-art digital hardware [26][27][28]. - The article discusses the importance of algorithm co-design, emphasizing that algorithms should be tailored to leverage the unique characteristics of the hardware, thereby enhancing performance and efficiency [30][31]. Group 4 - The potential applications of physics-based ASICs span various fields, including scientific simulations, data analysis, and AI, with specific algorithms inspired by physical processes showing promise for enhanced performance [36][39]. - Notable examples of physics-inspired applications include artificial neural networks, diffusion models, sampling methods, and optimization techniques, all of which can benefit from the unique capabilities of physics-based ASICs [40][42][44]. - The article outlines a roadmap for the adoption of physics-based ASICs, emphasizing the need for scalability, integration into heterogeneous systems, and the development of user-friendly software abstractions to facilitate widespread use [48][56][57].

基于物理的专用集成电路（ASIC）

基于物理的ASIC

人工神经网络

基于物理的专用集成电路（ASIC）

基于物理的ASIC

人工神经网络

自动驾驶圆桌论坛 | 聊聊自动驾驶上半年都发生了啥？

自动驾驶之心· 2025-07-14 11:30

Core Viewpoint - The article discusses the current state and future directions of autonomous driving technology, highlighting the maturity of certain technologies, the challenges that remain, and the emerging trends in the industry. Group 1: Current Technology Maturity - The introduction of BEV (Bird's Eye View) and OCC (Occupancy) perception methods has matured, with no major players claiming that BEV is unusable [2][13] - The main challenge remains corner cases, where 99% of scenarios are manageable, but complex situations like rural roads and large intersections still pose difficulties [13] - E2E (End-to-End) models have not yet demonstrated clear advantages over two-stage models in practical applications, despite their theoretical appeal [4][5] Group 2: Emerging Technologies - VLA (Vision-Language Alignment) is gaining attention as it simplifies tasks and potentially addresses corner cases more effectively than traditional methods [5][6] - The efficiency of models is a critical issue, with discussions around using smaller models to achieve performance close to larger ones [6][30] - Reinforcement learning has not yet proven to be significantly impactful in autonomous driving, with a need for better simulation environments to validate its effectiveness [7][51] Group 3: Future Directions - There is a consensus that VLA and VLM (Vision-Language Model) will be key areas for future development, focusing on enhancing reasoning capabilities and safety [45][48] - The industry is moving towards a more data-driven approach, where the efficiency of data collection, cleaning, and training will determine competitive advantage [28][40] - The integration of world models and closed-loop simulations is seen as essential for advancing autonomous driving technologies [47][50] Group 4: Industry Perspectives - The shift towards VLA/VLM is viewed as a necessary evolution, with the potential to improve user experience and safety in autonomous vehicles [28][45] - The debate between deepening expertise in autonomous driving versus transitioning to embodied intelligence reflects the industry's evolving landscape and personal career choices [22][27] - The current focus on safety and robustness in L4 (Level 4) autonomous driving indicates a divergence in technical approaches between L2+ and L4 players [25][36]