数据污染冲击安全防线，国安部：警惕人工智能“数据投毒”

Group 1 - The core issue highlighted is the presence of poor-quality training data in artificial intelligence, which includes false information, fabricated content, and biased viewpoints, leading to data source contamination and new challenges for AI safety [1][5]. - Data is identified as a fundamental element for AI, essential for training models and driving AI applications, with high demands for quantity, quality, and diversity to enhance model performance [3][5]. - Data pollution can significantly impair model accuracy and reliability, potentially causing decision-making errors and system failures, with harmful content generated through data poisoning affecting model training [5][6]. Group 2 - Even a small percentage of false text in training data can lead to a substantial increase in harmful outputs, with 0.01% of false text resulting in an 11.2% increase in harmful content [6]. - In the financial sector, data pollution can lead to the creation of false information that may cause abnormal stock price fluctuations, posing new market manipulation risks [7]. - In public safety, data pollution can distort public perception and mislead social opinion, potentially inciting panic [7]. Group 3 - To strengthen AI data security, it is recommended to enhance source regulation to prevent data contamination, supported by existing laws such as the Cybersecurity Law and Data Security Law [9]. - Risk assessment should be reinforced to ensure data safety throughout its lifecycle, including collection, storage, and transmission [9]. - A governance framework should be established for data cleaning and repair, with specific rules based on legal standards to manage and control data quality continuously [9].