Workflow
对抗攻击
icon
Search documents
NeurIPS2025 | 攻破闭源多模态大模型:一种基于特征最优对齐的新型对抗攻击方法
机器之心· 2025-10-17 04:09
Core Insights - The article discusses the advancements and security vulnerabilities of Multimodal Large Language Models (MLLMs), particularly their susceptibility to adversarial attacks [2][8] - It introduces a novel attack framework called FOA-Attack, which enhances the transferability of adversarial samples across different models by optimizing feature alignment at both global and local levels [3][11] Group 1: Background and Motivation - MLLMs like GPT-4 and Claude-3 exhibit exceptional performance in tasks such as image understanding and visual question answering, but they inherit vulnerabilities from their visual encoders, making them prone to adversarial attacks [8][10] - Adversarial attacks can be categorized into non-targeted (aiming to produce incorrect outputs) and targeted (aiming for specific outputs), with the latter being particularly challenging in black-box scenarios where model internals are inaccessible [10][11] Group 2: FOA-Attack Framework - FOA-Attack employs a dual-dimensional alignment strategy, focusing on both global features (using cosine similarity loss for [CLS] tokens) and local features (using clustering and optimal transport for patch tokens) to improve transferability [6][11] - The framework includes a dynamic weight integration strategy that adapts the influence of multiple models during the attack generation process, enhancing the overall effectiveness of the attack [6][11] Group 3: Experimental Results - FOA-Attack significantly outperforms existing state-of-the-art methods in both open-source and closed-source MLLMs, achieving remarkable success rates, particularly against commercial closed-source models like GPT-4 [4][19] - In experiments, FOA-Attack achieved an attack success rate (ASR) of 75.1% against GPT-4, showcasing its effectiveness in real-world applications [19][24] Group 4: Conclusion and Future Directions - The findings highlight the vulnerabilities of current MLLMs in the visual encoding phase and suggest new defensive strategies, particularly in fortifying local feature robustness [24][25] - The authors have made the paper and code publicly available for further exploration and discussion, indicating a commitment to advancing research in this area [25][27]
具身智能体主动迎战对抗攻击,清华团队提出主动防御框架
3 6 Ke· 2025-08-12 11:30
Core Insights - The article discusses the introduction of the REIN-EAD framework, which enables embodied intelligent agents to actively defend against adversarial attacks by enhancing their perception and decision-making capabilities [1][3][8]. Group 1: Adversarial Attacks and Current Defenses - Adversarial attacks pose significant threats to the safety and reliability of visual perception systems, particularly in critical areas like facial recognition and autonomous driving [2]. - Existing defense methods primarily rely on passive strategies, such as adversarial training and input purification, which may fail against unknown or adaptive attacks [2][3]. Group 2: REIN-EAD Framework - The REIN-EAD framework integrates perception and strategy modules to simulate human-like motion vision mechanisms, allowing continuous observation and exploration in dynamic environments [3][8]. - It employs a cumulative information exploration reinforcement learning method to optimize active strategies, enhancing the system's ability to identify and adapt to potential threats [4][11]. Group 3: Offline Adversarial Patch Approximation (OAPA) - The OAPA technique addresses the computational challenges of adversarial training in 3D environments by creating a universal defense mechanism that does not rely on opponent information [5][6][18]. - This method significantly reduces training costs while maintaining robust defense capabilities against unknown or adaptive attacks [6][18]. Group 4: Performance and Generalization - REIN-EAD demonstrates superior performance across multiple tasks and environments, outperforming existing passive defense methods in resisting various unknown and adaptive attacks [7][19]. - The framework's strong generalization ability and adaptability to complex real-world scenarios highlight its potential applications in safety-critical systems [7][19][31].
具身智能体主动迎战对抗攻击,清华团队提出主动防御框架
量子位· 2025-08-12 09:35
Core Viewpoint - The article discusses the REIN-EAD framework, which enables embodied intelligent agents to actively defend against adversarial attacks by learning to perceive and interact with their environment, inspired by human visual systems [1][2][3]. Group 1: Framework Overview - REIN-EAD is designed to enhance the robustness of perception in adversarial scenarios by allowing agents to "look twice," thereby improving their ability to handle adversarial attacks [2]. - The framework integrates perception and strategy modules to simulate motion vision mechanisms, enabling continuous observation and exploration of dynamic environments [5]. - It employs a cumulative information exploration method to optimize active strategies, enhancing the agent's ability to identify high-risk areas and adjust behavior dynamically [6]. Group 2: Technical Contributions - The introduction of Offline Adversarial Patch Approximation (OAPA) significantly reduces training costs while providing robust defense capabilities against unknown or adaptive attacks in 3D environments [7]. - The framework demonstrates superior performance across multiple tasks and environments, showcasing its generalization and adaptability compared to existing passive defense methods [8]. Group 3: Experimental Results - Experimental validation indicates that REIN-EAD significantly lowers the success rate of attacks while maintaining standard model accuracy, even against unknown and adaptive attacks [4][31]. - In various tasks such as face recognition, 3D object classification, and object detection, REIN-EAD outperforms baseline defenses like SAC, PZ, and DOA [31][43]. - The framework's ability to accumulate multi-step interactions enhances its robustness and generalization, making it suitable for complex tasks in real-world scenarios [49].