AI运行安全
Search documents
南洋理工揭露AI「运行安全」的全线崩溃,简单伪装即可骗过所有模型
3 6 Ke· 2025-10-17 07:16
Core Insights - The article emphasizes the critical issue of AI operational safety, highlighting that when AI exceeds its designated responsibilities, it poses significant risks, regardless of the content it generates [3][12][16] - The concept of "Operational Safety" is introduced as a necessary condition for AI safety, shifting the focus from mere content filtering to the AI's adherence to its defined roles [3][5][16] Summary by Sections Operational Safety - The term "Operational Safety" is proposed to reshape the understanding of AI safety boundaries in specific contexts, indicating that an AI's failure to maintain its role is a fundamental safety concern [3][5][12] Evaluation Framework - The OffTopicEval benchmark was developed to assess operational safety, focusing on whether models can appropriately refuse to answer out-of-domain questions rather than their overall knowledge or capabilities [5][12] - The evaluation involved 21 different scenarios with over 210,000 out-of-domain questions and 3,000 in-domain questions across three languages: English, Chinese, and Hindi [5][10] Model Performance - Testing revealed that nearly all major models, including GPT and Qwen, failed to meet operational safety standards, with significant drops in refusal rates for out-of-domain questions [7][10] - For instance, models like Gemma-3 and Qwen-3 experienced refusal rate declines exceeding 70% when faced with deceptively disguised out-of-domain queries [10][11] Solutions and Improvements - The research team proposed practical solutions to enhance AI's adherence to its roles, including lightweight prompt-based steering methods that significantly improved operational safety scores for various models [12][15] - The P-ground method, for example, increased the operational safety score of Llama-3.3 by 41%, demonstrating that simple adjustments can lead to substantial improvements [12][13] Industry Implications - The findings call for a reevaluation of AI safety standards within the industry, urging developers to prioritize operational safety as a prerequisite for deploying AI in serious applications [14][16] - The paper serves as a declaration for the community to redefine AI safety, ensuring that AI systems are not only powerful but also trustworthy and responsible [14][16]