OffTopicEval - filings, earnings calls, financial reports, news

OffTopicEval

Search documents

3 6 Ke· 2025-10-17 07:16

Core Insights - The article emphasizes the critical issue of AI operational safety, highlighting that when AI exceeds its designated responsibilities, it poses significant risks, regardless of the content it generates [3][12][16] - The concept of "Operational Safety" is introduced as a necessary condition for AI safety, shifting the focus from mere content filtering to the AI's adherence to its defined roles [3][5][16] Summary by Sections Operational Safety - The term "Operational Safety" is proposed to reshape the understanding of AI safety boundaries in specific contexts, indicating that an AI's failure to maintain its role is a fundamental safety concern [3][5][12] Evaluation Framework - The OffTopicEval benchmark was developed to assess operational safety, focusing on whether models can appropriately refuse to answer out-of-domain questions rather than their overall knowledge or capabilities [5][12] - The evaluation involved 21 different scenarios with over 210,000 out-of-domain questions and 3,000 in-domain questions across three languages: English, Chinese, and Hindi [5][10] Model Performance - Testing revealed that nearly all major models, including GPT and Qwen, failed to meet operational safety standards, with significant drops in refusal rates for out-of-domain questions [7][10] - For instance, models like Gemma-3 and Qwen-3 experienced refusal rate declines exceeding 70% when faced with deceptively disguised out-of-domain queries [10][11] Solutions and Improvements - The research team proposed practical solutions to enhance AI's adherence to its roles, including lightweight prompt-based steering methods that significantly improved operational safety scores for various models [12][15] - The P-ground method, for example, increased the operational safety score of Llama-3.3 by 41%, demonstrating that simple adjustments can lead to substantial improvements [12][13] Industry Implications - The findings call for a reevaluation of AI safety standards within the industry, urging developers to prioritize operational safety as a prerequisite for deploying AI in serious applications [14][16] - The paper serves as a declaration for the community to redefine AI safety, ensuring that AI systems are not only powerful but also trustworthy and responsible [14][16]

AI运行安全

prompt-based steering

prompt-based steering

南洋理工揭露AI「运行安全」的全线崩溃，简单伪装即可骗过所有模型

机器之心· 2025-10-17 04:09

Core Viewpoint - The article emphasizes that when AI exceeds its predefined boundaries, its behavior itself constitutes a form of insecurity, introducing the concept of Operational Safety as a new dimension in AI safety discussions [7][9]. Summary by Sections Introduction to Operational Safety - The research introduces the concept of Operational Safety, aiming to reshape the understanding of AI safety boundaries in specific scenarios [4][9]. Evaluation of AI Models - The team developed the OffTopicEval benchmark to quantify risks associated with Operational Safety, focusing on whether models can appropriately refuse to answer out-of-domain questions [12][24]. - The evaluation involved 21 different scenarios with over 210,000 out-of-domain data points and 3,000 in-domain data points across English, Chinese, and Hindi languages [12]. Test Results and Findings - Testing revealed that nearly all major models, including GPT and Qwen, failed to meet Operational Safety standards, with significant drops in refusal rates for out-of-domain questions [14][16]. - For instance, models like Gemma-3 and Qwen-3 experienced refusal rate declines exceeding 70% when faced with deceptively disguised out-of-domain questions [16]. Proposed Solutions - The research suggests practical solutions to enhance models' adherence to their operational boundaries, including prompt-based steering methods that do not require retraining [20][21]. - Two lightweight prompting methods, P-ground and Q-ground, were shown to significantly improve models' operational safety scores, with P-ground increasing Llama-3.3's score by 41% [21][22]. Conclusion and Industry Implications - The paper calls for a reevaluation of AI safety, highlighting that AI must not only be powerful but also trustworthy and duty-bound [24][25]. - It stresses that operational safety is a prerequisite for deploying AI in serious applications, urging the establishment of new evaluation paradigms that reward models capable of recognizing their limitations [25].

运行安全（Operational Safety）

运行安全（Operational Safety）