Workflow
数据投毒
icon
Search documents
“数据投毒”或诱发有害输出!AI数据污染分为几类?专家解读→
Sou Hu Cai Jing· 2025-08-17 08:50
Core Viewpoint - The national security department has issued a warning about "data poisoning" in AI, which can lead to harmful outputs due to the manipulation, fabrication, and repetition of data [1]. Group 1: Data Poisoning Overview - "Data poisoning" primarily targets two areas: visual recognition and natural language processing [3]. - An example of data poisoning involves altering training data, such as adding a green dot to a zebra image, which can mislead AI models during training [3]. - Even a few contaminated samples among thousands can significantly disrupt the AI model's ability to recognize similar objects correctly [3]. Group 2: Types of Data Pollution - There are two main types of AI data pollution: one involves malicious human intervention to mislead AI outputs, and the other involves the unfiltered inclusion of harmful information from vast internet data collections [5]. - If untrustworthy data is not identified and removed, it can compromise the reliability of AI outputs [5]. Group 3: Data Sources and Risks - AI models require extensive data for training, often sourced from the internet, including various forms of media [7]. - The potential for data contamination exists as anyone can contribute data online, which may lead to the AI model being influenced by unsafe or polluted data [7].
马斯克称Grok回答中将引入广告丨南财合规周报(第202期)
Regulatory Governance - The Ministry of National Security warns about the risks of data poisoning and contamination in artificial intelligence training data, highlighting the presence of false information and biased viewpoints that can compromise AI safety [1][2] - Research indicates that even a mere 0.01% of false text in training data can increase harmful outputs by 11.2%, while 0.001% can raise harmful outputs by 7.2% [1] AI Developments - Elon Musk announced plans to introduce advertising into the responses of his AI product Grok, with a reported 40% increase in online ad conversion rates since June [3] - OpenAI launched GPT-5, claiming it to be the most advanced coding model to date, with significant improvements in programming capabilities [4] - Following the launch, GPT-5 received mixed reviews, with OpenAI's CEO acknowledging technical issues that affected its performance shortly after release [5] - Google DeepMind released Genie3, a third-generation model capable of generating interactive 3D virtual worlds with improved resolution and interaction time compared to its predecessor [6] - Apple is entering the AI search engine market, forming an internal team to develop a ChatGPT-like AI search experience, marking a significant shift in its approach to generative AI [6]
引用所谓董明珠言论被格力起诉,轻信AI核实信源要不得
Qi Lu Wan Bao· 2025-08-06 01:58
Core Viewpoint - The case involving a defamatory statement attributed to Dong Mingzhu highlights the risks associated with misinformation and the reliance on AI for verifying information, which can lead to significant reputational damage for companies like Gree [1][2]. Group 1: Legal Case and Implications - A Shenzhen individual was sued for allegedly fabricating statements attributed to Dong Mingzhu, resulting in a court ruling that required an apology and compensation of 70,000 yuan to Gree [1]. - The case underscores the potential consequences of misinformation, as the individual claimed to have verified the information using AI tools, which raises concerns about the reliability of AI in discerning truth [1][2]. Group 2: AI and Misinformation - The National Security Department issued a warning about the risks of "data poisoning" in AI training, noting that even a small percentage of false data can significantly increase harmful outputs from AI models [2]. - The report from Tsinghua University indicated a rapid increase in AI-related rumors, particularly in the economic and corporate sectors, with a staggering growth rate of 99.91% in the last six months [3]. Group 3: Regulatory and Collaborative Efforts - The Central Cyberspace Administration of China initiated a campaign to combat misinformation spread by self-media, focusing on the use of AI to generate false information [2][3]. - Proposed regulations, such as the upcoming "Artificial Intelligence Generated Content Labeling Method," aim to ensure transparency in AI-generated content and mitigate the spread of misinformation [3].
数据污染冲击安全防线,国安部:警惕人工智能“数据投毒”
Group 1 - The core issue highlighted is the presence of poor-quality training data in artificial intelligence, which includes false information, fabricated content, and biased viewpoints, leading to data source contamination and new challenges for AI safety [1][5]. - Data is identified as a fundamental element for AI, essential for training models and driving AI applications, with high demands for quantity, quality, and diversity to enhance model performance [3][5]. - Data pollution can significantly impair model accuracy and reliability, potentially causing decision-making errors and system failures, with harmful content generated through data poisoning affecting model training [5][6]. Group 2 - Even a small percentage of false text in training data can lead to a substantial increase in harmful outputs, with 0.01% of false text resulting in an 11.2% increase in harmful content [6]. - In the financial sector, data pollution can lead to the creation of false information that may cause abnormal stock price fluctuations, posing new market manipulation risks [7]. - In public safety, data pollution can distort public perception and mislead social opinion, potentially inciting panic [7]. Group 3 - To strengthen AI data security, it is recommended to enhance source regulation to prevent data contamination, supported by existing laws such as the Cybersecurity Law and Data Security Law [9]. - Risk assessment should be reinforced to ensure data safety throughout its lifecycle, including collection, storage, and transmission [9]. - A governance framework should be established for data cleaning and repair, with specific rules based on legal standards to manage and control data quality continuously [9].
0.01%虚假训练文本可致有害内容增加11.2% 警惕人工智能“数据投毒”
Core Viewpoint - The Ministry of National Security has issued a safety alert regarding the quality of training data for artificial intelligence, highlighting the presence of false information, fabricated content, and biased viewpoints, which leads to data contamination and poses new challenges to AI safety [1] Group 1 - The training data for artificial intelligence is characterized by a mix of quality, with significant issues related to misinformation and bias [1] - The contamination of data sources is identified as a critical challenge for the safety of artificial intelligence systems [1]
深度|95后Scale AI创始人:AI能力指数级增长,生物进化需要百万年,脑机接口是保持人类智慧与AI共同增长的唯一途径
Z Potentials· 2025-07-28 04:17
Core Insights - The article discusses the rapid advancement of AI technology and its implications for human evolution and society, emphasizing the need for brain-computer interfaces to keep pace with AI development [5][7][22]. Group 1: AI and Data - AI is compared to oil, serving as a crucial resource for future economies and military capabilities, with the potential for unlimited growth through self-reinforcing cycles [22][23]. - Data is highlighted as the new "oil," essential for feeding algorithms and enhancing AI capabilities, with companies competing for data center dominance [23][24]. - The three key components for AI development are algorithms, computational power, and data, with a focus on improving these elements to enhance AI performance [24][25]. Group 2: Brain-Computer Interfaces - Brain-computer interfaces (BCIs) are seen as the only way to maintain human relevance alongside rapidly advancing AI, despite the significant risks they pose [7][22]. - Potential risks of BCIs include memory theft, thought manipulation, and the possibility of creating a reality where individuals can be controlled or influenced by external entities [6][7][26]. - The technology could enable profound enhancements in human cognition, allowing individuals to access vast amounts of information and think at superhuman speeds [9][10]. Group 3: Scale AI - Scale AI, founded by Alexandr Wang, provides essential data support for major AI models, including ChatGPT, and is valued at over $25 billion [2][10]. - The company initially gained recognition for creating large-scale datasets and has since expanded its focus to include partnerships with significant clients, including the U.S. Department of Defense [11][56]. - Scale AI's growth trajectory has been rapid, expanding from a small team to approximately 1,100 employees within five years, with a strong emphasis on the autonomous driving sector [64].
3D高斯泼溅算法大漏洞:数据投毒让GPU显存暴涨70GB,甚至服务器宕机
量子位· 2025-04-22 05:06
Core Viewpoint - The emergence of 3D Gaussian Splatting (3DGS) as a leading 3D modeling technology has introduced significant security vulnerabilities, particularly through a newly proposed attack method called Poison-Splat, which can drastically increase training costs and system failures [1][2][31]. Group 1: Introduction and Background - 3DGS has rapidly become a dominant technology in 3D vision, replacing NeRF due to its high rendering efficiency and realism [2][7]. - The adaptive nature of 3DGS, which adjusts computational resources based on scene complexity, is both a strength and a potential vulnerability [8][11]. - The research highlights a critical security blind spot in mainstream 3D reconstruction systems, revealing how minor alterations to input images can lead to significant operational disruptions [2][31]. Group 2: Attack Mechanism - The Poison-Splat attack targets the GPU memory usage and training time by introducing perturbations to input images, leading to increased computational costs [12][22]. - The attack is modeled as a max-min bi-level optimization problem, employing innovative strategies such as a proxy model to approximate the victim's behavior and maximizing the Total Variation (TV) of images to induce excessive complexity in 3DGS [13][16][15]. - The attack can significantly increase GPU memory usage from under 4GB to 80GB and training time by up to five times, demonstrating its effectiveness [25][22]. Group 3: Experimental Results - Experiments conducted on various 3D datasets showed that unconstrained attacks could lead to GPU memory usage surging by 20 times and rendering speeds dropping to one-tenth of the original [25][22]. - Even with constraints on pixel perturbations, the attack remains potent, with some scenarios showing over eightfold increases in memory consumption [27][22]. Group 4: Implications and Contributions - The research emphasizes that the findings are not merely academic but represent real threats to 3D service providers that allow user-uploaded content [31][40]. - Simple defenses, such as limiting the number of Gaussian points, are ineffective as they compromise the quality of 3D reconstructions [39][35]. - The study aims to raise awareness about the security of AI systems in 3D modeling, advocating for the development of more intelligent defense mechanisms [41][37].