Workflow
生成式人工智能安全
icon
Search documents
联合实验室是湾区AI生态“黏合剂”
Nan Fang Du Shi Bao· 2025-09-15 23:09
南方都市报(以下简称"南都"):对于当前大湾区生成式人工智能安全领域技术创新与产业布局,在您 看来是怎样的? 陈俊龙:当前大湾区在生成式AI安全领域的技术创新活跃。大湾区凭借其区位优势与创新氛围,吸引 了大量人才与资金投入,催生了一系列前沿技术成果,在模型的安全可信、量化评级方向取得了显著进 展。 华南理工大学计算机科学与工程学院院长、欧洲科学院院士、联合实验室专家陈俊龙 "大湾区生成式AI安全领域技术创新活跃!"在华南理工大学计算机科学与工程学院院长、欧洲科学院院 士、联合实验室专家陈俊龙看来,粤港澳大湾区生成式人工智能安全发展联合实验室(简称"联合实验 室")承担着大湾区AI产业生态黏合剂、试验田的双重角色,"生成式人工智能将在大湾区各行业实现更 深层次、更广范围的赋能"。 谈优势 大湾区生成式AI安全技术创新活跃 南都:我们知道,AI安全发展涉及政产学研用等多个方面,如何协调各方资源协同治理呢? 陈俊龙:在AI治理中,需立足各自定位、聚焦安全核心需求构建协同体系。 比如,政府可依托深圳前海与河套的制度创新优势,通过明确AI安全底线要求与风险分级标准、开放 监管科技接口并开展沙箱试点。同时,设立负面清单制 ...
企业 GenAI 的最大风险以及早期使用者的经验教训
3 6 Ke· 2025-08-11 00:20
Overview - Generative AI is included in corporate roadmaps, but companies should not release any unsafe products. The threat model has changed due to LLMs, where untrusted natural language can become an attack surface, and outputs can be weaponized. Models should operate in a sandboxed, monitored, and strictly authorized environment [1][2] Security Challenges - Immediate injection attacks, including indirect attacks hidden in files and web pages, are now a top risk for LLMs. Attackers can compromise inputs without breaching backend systems, leading to data theft or unsafe operations [4][5] - Abuse of agents/tools and "over-proxying" create new permission boundaries. Overly permissive agents can be lured into executing powerful operations, necessitating strict RBAC and human approval for sensitive actions [4][5] - RAG (Retrieval-Augmented Generation) introduces new attack surfaces, where poisoned indexes can lead to adversarial outputs. Defensive measures are still evolving [4][5] - Privacy leaks and IP spillage are active research areas, with large models sometimes memorizing sensitive training data. Improvements in vendor settings are ongoing [4][5] - The AI supply chain is vulnerable, with risks from backdoored models and deceptive alignments. Organizations need robust provenance and behavior review measures [4][5] - Unsafe output handling can lead to various security issues, including XSS and SSRF attacks. Strict output validation and execution policies are essential [4][5] - DoS attacks and cost abuse can arise from malicious workloads, necessitating rate limits and alert systems [4][5] - Observability and compliance challenges exist, requiring structured logging and change control while adhering to privacy laws [4][5] - Governance drift and model/version risks arise from frequent updates, emphasizing the need for continuous security testing and version control [4][5] - Content authenticity and downstream misuse remain concerns, with organizations encouraged to track output provenance [4][5] Action Plan for Next 90 Days - Conduct a GenAI security and privacy audit to identify sensitive data entry points and deploy immediate controls [6][7] - Pilot high-value, low-risk use cases to demonstrate value while minimizing customer risk [6][7] - Implement evaluation tools with human review and key metrics before widespread deployment [6][7] Case Studies - JPMorgan Chase implemented strict prompts and a code snippet checker to prevent sensitive data leaks in their AI coding assistant, resulting in zero code leak incidents by 2024 [16] - Microsoft enhanced Bing Chat's security by limiting session lengths and improving prompt isolation, significantly reducing successful prompt injection attempts [17] - Syntegra utilized differential privacy in their medical AI to prevent the model from recalling sensitive patient data, ensuring compliance with HIPAA [18] - Waymo employed a model registry to ensure the security of their machine learning supply chain, successfully avoiding security issues over 18 months [19][20] 30-60-90 Day Action Plan - The first 30 days should focus on threat modeling workshops and implementing basic input/output filtering [22][23] - The next 31-60 days should involve red team simulations and the deployment of advanced controls based on early findings [24][25] - The final phase (61-90 days) should include external audits and optimization of monitoring metrics to ensure ongoing compliance and security [27][28]