Workflow
扩散生成模型
icon
Search documents
MLSys 2026 | StreamDiffusionV2: 将视频生成从「离线生成」带入「实时交互」,实现真正可用的生成式直播系统
机器之心· 2026-03-13 10:41
Core Insights - The article discusses the advancements in real-time video live streaming content creation through diffusion models, particularly focusing on the StreamDiffusionV2 system, which addresses the challenges of latency and quality in interactive live generation [2][29]. Group 1: System Overview - StreamDiffusionV2 is an open-source, interactive live video generation system that operates stably on various GPU types, achieving low latency and high-quality output [3][29]. - The system can maintain 16 FPS real-time inference on devices equipped with dual RTX 4090 cards, with first-frame latency under 0.5 seconds on H100 GPUs [3][29]. Group 2: Challenges in Existing Models - Current video diffusion models are primarily optimized for offline generation, leading to high first-frame latency and difficulties in meeting strict service level objectives (SLO) for live streaming [11]. - Issues such as temporal drift during long-term generation, motion blur, and tearing during fast actions are prevalent in existing models, necessitating a redesign focused on real-time constraints [11][7]. Group 3: Performance Bottlenecks - The performance of existing systems is limited by memory bandwidth rather than computational power, particularly in autoregressive video generation [13][14]. - Communication overhead from sequence parallelism in multi-GPU setups further exacerbates performance issues, making it challenging to achieve real-time interactive generation [13][14]. Group 4: Proposed Solutions - The research team introduces a dual optimization approach, addressing both algorithmic and system-level challenges to enhance real-time video generation [15][16]. - Key innovations include a pipeline-based batch denoising strategy and a dynamic noise adjustment mechanism based on motion intensity, which collectively improve generation quality and consistency [17][18]. Group 5: Experimental Results - StreamDiffusionV2 demonstrates a balance between low latency and high throughput, achieving significant performance improvements over previous models [22][26]. - The system exhibits tight latency distribution and low jitter, meeting sub-second real-time application requirements while maintaining high-quality generation [26][27]. Group 6: Future Implications - StreamDiffusionV2 bridges the gap between offline video diffusion and real-time live streaming, marking a significant step towards feasible high-quality generative live broadcasts [29][34]. - The design philosophy of focusing on memory access and real-time constraints is expected to shape the future of generative services in the industry [33][35].
何恺明改进了谢赛宁的REPA:极大简化但性能依旧强悍
机器之心· 2025-06-12 09:57
Core Viewpoint - The article discusses the significance of representation learning in generative models, particularly through the introduction of a new method called Dispersive Loss, which integrates self-supervised learning into diffusion-based generative models without requiring additional pre-training or external data sources [6][9][43]. Group 1: Diffusion Models and Representation Learning - Diffusion models excel in modeling complex data distributions but are largely disconnected from the representation learning field [2]. - The training objectives of diffusion models typically focus on reconstruction tasks, such as denoising, lacking explicit regularization for learned representations [3]. - Representation learning, particularly self-supervised learning, is crucial for learning general representations applicable to various downstream tasks [4]. Group 2: Introduction of Dispersive Loss - Dispersive Loss is a flexible and general plug-in regularizer that integrates self-supervised learning into diffusion-based generative models [9]. - The core idea of Dispersive Loss is to introduce a regularization target for the model's internal representations, encouraging them to spread out in the latent space [10][13]. - This method does not require additional layers or parameters, making it a simple and independent approach [15][16]. Group 3: Comparison with Existing Methods - Dispersive Loss operates without the need for pre-training, external data, or additional model parameters, unlike the REPA method, which relies on pre-trained models [7][41][43]. - The new method demonstrates that representation learning can benefit generative modeling without external information sources [13][43]. - In practical applications, introducing Dispersive Loss requires minimal adjustments, such as specifying the intermediate layers for regularization [29]. Group 4: Performance Evaluation - Experimental results show that Dispersive Loss consistently outperforms corresponding contrastive losses while avoiding the complexities of dual-view sampling [33]. - The method has been tested across various models, including DiT and SiT, showing improvements in all scenarios, particularly in larger models where effective regularization is crucial [36][37]. - The article highlights that Dispersive Loss can be generalized for one-step diffusion-based generative models, indicating its versatility [44].