AI安全护栏
Search documents
AI出海如何合规?港中文(深圳)吴保元:设个性化安全护栏
Nan Fang Du Shi Bao· 2026-01-07 11:37
在吴保元看来,"人工智能安全"可划分为AI助力安全、AI内生安全以及AI衍生安全三个层次。 具体而言,AI在身份安全、信息安全、网络安全等传统安全领域具备显著应用价值,能够提供切实有 效的保障助力,例如检测电信诈骗风险、防范恶意软件入侵等场景。但与此同时,AI也面临着隐私 性、精确性、鲁棒性的安全"不可能三角"困境——功能足够强大、输出足够精准的AI模型,往往容易出 现隐私泄露和鲁棒性不足等问题。多项研究成果及案例已表明,以ChatGPT为代表的大语言模型会"记 忆"海量训练数据,而当前流行的视觉生成模型也能轻易生成如现实人物肖像等原始训练数据,这类现 象不仅存在明显的隐私泄露隐患,更直接构成了AI内生安全风险。 AI衍生安全风险同样不容忽视。吴保元指出,AI技术在军事领域的武器化应用、在传播领域的虚假信 息生成与扩散、对现有职场岗位的替代效应,以及其可能诱发的"信息茧房"加剧、歧视偏见放大等问 题,都可能对现实社会秩序和公共利益产生负面影响,形成不容忽视的衍生安全风险。 针对上述安全风险,吴保元认为,有必要对AI模型开展价值对齐训练,确保AI的行为逻辑与人类的意 图和价值观保持一致,符合人类社会的法律法规、 ...
AI生成内容需“表明身份”,虚假信息将套上紧箍咒
3 6 Ke· 2025-09-02 11:35
Core Viewpoint - The rapid advancement of generative artificial intelligence (AIGC) has made it increasingly difficult for internet users to distinguish between true and false content, leading to a proliferation of AI-generated misinformation [1][3]. Regulation and Responsibility - The National Internet Information Office and three other departments have introduced the "Artificial Intelligence Generated Synthetic Content Identification Measures," effective from September 1, requiring explicit and implicit labeling for all AI-generated content [3][10]. - The new regulation places the primary responsibility for AI-generated content on the content creators, marking a significant shift from previous content management systems established by platforms like WeChat and Douyin [3][14]. AI Misuse and Challenges - AI has become a major tool for misinformation, with examples of scams and fraudulent activities utilizing AI-generated content [5][6]. - The emergence of user-friendly AI technologies has made it easier for malicious actors to create deceptive content, as seen with the rise of deepfake technology [6][7]. Safety Measures and Limitations - Major tech companies are developing "AI Guardrails" to prevent harmful content generation, but these measures face inherent limitations due to the need for AI models to maintain a degree of autonomy [9][10]. - The balance between safety and functionality is challenging, as overly strict safety measures could render AI models ineffective [10]. Watermarking and Content Authenticity - Companies like Microsoft, Adobe, and OpenAI have formed the C2PA alliance to implement watermarking techniques to distinguish AI-generated content from human-created works, but these watermarks can be easily removed [12]. - Current operational strategies by internet platforms to require creators to disclose AI-generated content have not been effective, as many creators fear that such disclosures will limit their content's reach [12][14].
直播中喵喵叫,提示词攻击成为数字人的阿喀琉斯之踵
3 6 Ke· 2025-06-17 12:27
Core Viewpoint - Digital human live streaming is a hot concept in the current live e-commerce industry, with brands increasingly opting for cost-effective digital humans over real hosts, but there are significant vulnerabilities such as prompt injection attacks that can disrupt the process [1][3][14]. Group 1: Digital Human Live Streaming - Digital human hosts are being used by brands for live streaming sales due to their cost-effectiveness, operating 24/7 without the need for physical resources [14]. - The recent incident of a digital human host executing unrelated commands due to a prompt injection attack highlights the risks associated with this technology [3][17]. - The technology behind digital humans is often not well understood by the merchants using them, leading to potential security vulnerabilities [14][15]. Group 2: Prompt Injection Attacks - Prompt injection is a method where users can manipulate AI responses by issuing specific commands, as demonstrated when a digital human mistakenly responded to a non-relevant prompt [3][7]. - The inability of AI systems to distinguish between trusted developer commands and untrusted user inputs raises concerns about security and reliability [10]. - Previous incidents, such as attacks on ChatGPT and Microsoft Copilot, illustrate that prompt injection is a widespread issue affecting various AI applications [7][12]. Group 3: AI Security Measures - AI guardrails are necessary to ensure that AI systems operate within human expectations and do not generate harmful content or leak sensitive information [10][12]. - Current AI security measures are not fully equipped to handle the unique risks posed by AI models, particularly in the context of prompt injection attacks [10][12]. - Developers face the challenge of balancing AI performance and security, as overly stringent guardrails can hinder the AI's ability to generate high-quality responses [12][14].