Workflow
语义约束对抗样本
icon
Search documents
一个指令误导智能模型!北航等首创3D语义攻击框架,成功率暴涨119%
量子位· 2025-10-23 03:52
Core Viewpoint - The article discusses the security alignment issues of artificial intelligence models, particularly focusing on the newly proposed InSUR framework for generating adversarial samples that are independent of specific tasks and models [1][2]. Group 1: InSUR Framework Overview - The InSUR framework is based on the concept of instruction uncertainty reduction, allowing for the generation of adversarial samples that mislead both known and unknown models with just one instruction [2][4]. - The framework integrates a 3D generation approach, achieving the first-ever generation of natural 3D adversarial objects through a single instruction, validating the effectiveness of the introduced sampling technique (ResAdv-DDIM) [6][8]. Group 2: Challenges in Semantic Adversarial Sample Generation - The existing methods for generating semantic adversarial samples face three main challenges: referring diversity, description incompleteness, and boundary ambiguity [14][21]. - InSUR addresses these challenges through a combination of stable attack direction driven by residuals, rule encoding for the generation process, and semantic hierarchical abstraction evaluation methods [8][12]. Group 3: Sampling Method and Task Modeling - The ResAdv-DDIM sampling method stabilizes the attack direction by predicting a rough outline of the final target during the denoising process, which enhances the robustness and transferability of adversarial samples [12][16]. - Task modeling is achieved by incorporating task goal embedding strategies, enabling effective generation of both 2D and 3D semantic adversarial samples [22][27]. Group 4: Evaluation and Results - The evaluation of the InSUR framework shows significant improvements in attack success rates (ASR) across various models and tasks, with an average ASR increase of at least 1.19 times and a minimum ASR increase of 1.08 times while maintaining low perceptual loss (LPIPS) [40][41]. - The framework's design decouples it from specific models and tasks, demonstrating scalability and effectiveness in generating high-fidelity adversarial test scenarios for safety-critical systems [45][46].