Core Insights - The traditional belief that large language models (LLMs) require a significant amount of poisoned data to create vulnerabilities has been challenged by recent research, indicating that only 250 malicious documents are sufficient to implant backdoor vulnerabilities in LLMs, regardless of their size or training data volume [1][6][20]. Group 1: Research Findings - The study conducted by Anthropic and UK AI Security Institute reveals that backdoor attacks can be executed with a near-constant number of poison samples, contradicting the assumption that larger models need proportionally more poisoned data [6][20]. - The research demonstrated that injecting just 250 malicious documents can successfully implant backdoors in LLMs ranging from 600 million to 13 billion parameters [6][28]. - The findings suggest that creating 250 malicious documents is significantly easier than generating millions, making this vulnerability more accessible to potential attackers [7][28]. Group 2: Attack Mechanism - The specific type of backdoor attack tested was a denial-of-service (DoS) attack, where the model outputs random gibberish when encountering a specific trigger phrase, such as
管你模型多大,250份有毒文档统统放倒,Anthropic:LLM比想象中脆弱
机器之心·2025-10-10 03:47