大模型安全研究报告（2024年）

Industry Investment Rating - The report does not explicitly provide an investment rating for the industry [1][2][3] Core Viewpoints - The rapid development of large models (LLMs) is driving the transition from specialized weak AI to general strong AI, marking a significant leap in intelligence and transforming human-computer interaction and application development models [3] - The commercialization and industrialization of large models have introduced new risks such as model "hallucinations," prompt injection attacks, and the democratization of cyberattacks, alongside exacerbating existing AI security risks [3] - Large models also present new opportunities for solving cybersecurity bottlenecks, leveraging their capabilities in information understanding, knowledge extraction, and task orchestration [3][24] - The report proposes a comprehensive framework for large model safety, focusing on both the safety of the models themselves and their application in enhancing cybersecurity [3][25] Summary by Sections Large Model Technology Evolution - The evolution of large models can be divided into three phases: the exploration phase (2017-2021) with pre-trained language models like GPT-1 and BERT, the explosion phase (2022-2023) with language models like ChatGPT, and the enhancement phase (2024-present) with multimodal models like Sora and GPT-4o [14][15][16][17] Security Challenges of Large Models - Large models face significant security risks across four key components: training data, algorithm models, system platforms, and business applications [18] - Training data risks include data leakage, data poisoning, and low-quality data [19] - Algorithm model risks include insufficient robustness, model "hallucinations," bias, and poor interpretability [20] - System platform risks include vulnerabilities in machine learning frameworks and development toolchains [21] - Business application risks include the generation of illegal or harmful content and data leakage [22][23] New Security Opportunities with Large Models - Large models can significantly enhance the precision and timeliness of threat identification, defense, detection, response, and recovery in cybersecurity [24] - They improve the universality and usability of data security technologies by automating data classification and reducing reliance on manual analysis [24] - Large models also enhance the robustness and accuracy of content security technologies, particularly in detecting deepfakes and other malicious content [24] Large Model Safety Framework - The safety framework for large models includes four dimensions: safety goals, safety attributes, protection objects, and safety measures [25][26] - Safety goals focus on ensuring the credibility of training data, the reliability of algorithm models, the stability of system platforms, and the controllability of business applications [32][33] - Safety attributes include authenticity, diversity, accuracy, confidentiality, accountability, predictability, fairness, transparency, explainability, compliance, reliability, controllability, and robustness [34] - Protection objects include the system, data, users, and behavior [35][36] - Safety measures cover training data protection, algorithm model protection, system platform protection, and business application protection [37][38][39][40][41][42][43][44][45][46][47][48][49][50][51][52][53][54][55] Large Model Empowering Security - Large models can empower cybersecurity, data security, and content security by leveraging their capabilities in natural language understanding, knowledge extraction, and task orchestration [60] - In cybersecurity, large models are applied in threat identification, protection, detection, response, and recovery, with applications in threat intelligence generation, vulnerability mining, code auditing, and network attack tracing [61][62][63][64][65][66][67][68][69] - In data security, large models are used for automated data classification and detection of personal information violations in apps and SDKs [60] - In content security, large models enhance the detection of text, image, video, and audio content for illegal or harmful information [60]