Workflow
最优传输理论
icon
Search documents
近500页史上最全扩散模型修炼宝典,宋飏等人一书覆盖三大主流视角
机器之心· 2025-10-29 07:23
Core Viewpoint - The article discusses the comprehensive guide on diffusion models, highlighting their transformative impact on generative AI across various domains such as images, audio, video, and 3D environments [2][4]. Summary by Sections Introduction to Diffusion Models - Diffusion models are presented as a method that views the generation process as a gradual transformation over time, contrasting with traditional generative models that directly learn mappings from noise to data [11]. - The article emphasizes the need for a systematic understanding of diffusion models, which the book aims to provide, making it a valuable resource for both researchers and beginners [6][9]. Core Principles of Diffusion Models - The book outlines the foundational principles of diffusion models, connecting three key perspectives: variational methods, score-based methods, and flow-based methods, which together form a unified theoretical framework [11][13]. - It discusses how these models achieve efficient sample generation and enhanced controllability during the generation process [12]. Detailed Exploration of Perspectives - The variational view relates to denoising diffusion probabilistic models (DDPMs), providing a basis for probabilistic inference and optimization [23]. - The score-based view focuses on learning score functions to guide the denoising process, linking diffusion modeling with classical differential equation theory [23][24]. - The flow-based view describes the generation process as a continuous flow transformation, allowing for broader applications beyond simple generation tasks [24]. Sampling Techniques and Efficiency - The article highlights the unique feature of diffusion models, which refine samples from coarse to fine through noise removal, and discusses the trade-off between performance and efficiency [27][28]. - It introduces methods for improving sampling performance without retraining models, such as classifier guidance and advanced numerical solvers to enhance generation quality and speed [29][30]. Learning Fast Generative Models - The book explores strategies for directly learning fast generative models that approximate the diffusion process, aiming to reduce reliance on multi-step inference [30][31]. - Distillation-based methods are discussed, where a student model mimics a slower teacher model to achieve faster sampling while maintaining quality [30]. Comprehensive Coverage of Diffusion Models - The book aims to establish a lasting theoretical framework for diffusion models, focusing on continuous time dynamical systems that connect simple prior distributions to data distributions [33]. - It emphasizes the importance of understanding the underlying principles and connections between different methods to design and improve next-generation generative models [36].
DeepSeek“防弹衣”来了,模型内生安全加固方案,拒绝杀敌一千自损八百|上海AI Lab
量子位· 2025-03-13 03:28
Core Viewpoint - The article discusses the hidden dangers of the DeepSeek-R1 model, which, despite its strong reasoning capabilities, may leak harmful content during its thought process even when it refuses to answer questions. Existing defense technologies face a dilemma: they either fail to prevent attacks or overly restrict the model's responses, leading to a situation where normal questions are also rejected [1][2]. Summary by Sections Section 1: Introduction of X-Boundary - Shanghai Jiao Tong University and Shanghai AI Lab have jointly developed a security defense solution called X-Boundary, aiming to resolve the dilemma of existing defense technologies by separating harmful representations and eliminating them without compromising the model's general performance [2][3]. Section 2: Performance Analysis - X-Boundary has shown significant improvements in the DeepSeek-R1-Distill-Llama-8B model, effectively blocking information leakage by removing harmful features, akin to implanting a "cognitive purification chip" [3][4]. Section 3: Defense Methods and Challenges - The article highlights a critical imbalance between safety and intelligence in mainstream defense methods (SFT/DPO/GA/CB). While these methods reduce the attack success rate (ASR), they also significantly impair the model's reasoning capabilities, with a reported 10% drop in mathematical ability and over 50% of safety questions being unjustly rejected [5][6]. Section 4: Multi-Round Defense Training - Introducing multi-round defense data into models like Qwen2.5-7B-Chat has led to a 30% increase in misclassification rates, indicating a strong correlation between increased defense strength and usability loss. The existing methods struggle to clearly distinguish between harmful and benign queries, leading to excessive safety measures [6][7]. Section 5: X-Boundary Framework - The X-Boundary defense framework aims to create an "internal safety system" for large models, allowing for precise interception of dangerous content while ensuring safe information can pass through without detection [7][8]. Section 6: Dynamic Protection Network - The framework consists of three steps: 1. Boundary Drawing: Optimizing representation separation to prevent confusion between harmful and safe requests [8]. 2. Threat Dissolution: Applying irreversible perturbations to harmful representations [8]. 3. Intelligent Preservation: Maintaining the integrity of safe representations during training [8]. Section 7: Theoretical and Practical Validation - X-Boundary is supported by optimal transport theory, which enhances the clustering of safe representations, leading to faster convergence during model training. Experiments show a 27% and 18% improvement in convergence speed for Llama-3-8B and Qwen2.5-7B models, respectively [9][10]. Section 8: Balancing Safety and Intelligence - X-Boundary successfully establishes a clear boundary between harmful and safe representations within the model, addressing the chaos of traditional methods that fail to differentiate between the two [10][11]. Section 9: Robust Multi-Round Defense - With a clear distinction in representations, X-Boundary achieves a balance between safety and usability, maintaining over 99% of the model's original performance while minimizing misclassification rates [13][14]. Section 10: Scalability - When applied to larger models, such as the 14 billion parameter Qwen2.5-14B-Chat, X-Boundary continues to provide effective zero-perception defense, demonstrating its robustness across different model scales [15].