Workflow
Nightshade
icon
Search documents
机器学习中的数据投毒:人们为何以及如何操纵训练数据
3 6 Ke· 2026-01-19 01:56
Core Concept - Data poisoning is a significant threat to machine learning models, where the training data is altered to change the model's behavior, leading to irreversible biases or complete failure [2][7]. Group 1: Definition and Mechanism of Data Poisoning - Data poisoning refers to the manipulation of training data used to build machine learning models, which can irreversibly affect the model's performance [2]. - The impact of data poisoning is often not visible to ordinary observers, making it difficult to detect even in well-monitored training processes [6]. - Research indicates that as few as 250 documents can be sufficient to implement poisoning attacks across various applications [6]. Group 2: Criminal Activities - Criminals may engage in data poisoning to manipulate sensitive or valuable data for profit, especially in applications like banking or healthcare [3]. - The subtlety of data poisoning allows it to be effective while remaining hidden, making it a significant concern for security models [6][7]. Group 3: Intellectual Property Theft Prevention - Data poisoning can be used as a defensive mechanism by content creators to prevent unauthorized use of their work, aiming to render models ineffective if they attempt to learn from protected content [8]. - Tools like Nightshade and Glaze allow creators to introduce subtle changes that disrupt model training without visibly altering the original content [9][10]. Group 4: Marketing Implications - Data poisoning has evolved into a new form of search engine optimization (SEO), where marketers create content that influences AI training data to favor their brands [13][14]. - The use of large language models (LLMs) facilitates the generation of vast amounts of marketing content, allowing for efficient manipulation of training data [15]. - Marketers aim for subtle brand preferences in model outputs, which can violate the intended use of AI models without being overtly detectable [16][17]. Group 5: Mitigation Strategies - Companies should avoid using stolen data for training, as it poses ethical and practical risks [18]. - Monitoring and controlling data collection, along with thorough auditing of training data, are essential to prevent data poisoning [18]. - Testing models in real-world scenarios is crucial to identify any abnormal behaviors resulting from data poisoning [18].
How artists can protect their work from AI | Dr. Heather Zheng | TEDxChicago
TEDx Talks· 2025-10-28 17:00
Generative AI Impact on Identity - Generative AI models pose a serious threat to human identity by enabling the creation of fake versions of individuals using their data [6] - AI models can mimic voices with short audio recordings, leading to misuse, as seen with celebrities like Tom Hanks and Scarlett Johansson [7] - Scammers are using AI to generate fake faces and voices of loved ones to deceive seniors, and harmful content like undressing apps is being created [8] Impact on Creative Professionals - Generative AI models are trained to mimic the creations of artists, impacting their identity and livelihoods [11] - Artists are losing jobs and income due to AI-generated mimics overshadowing their work, leading some to quit or reconsider their careers [12] - Artists face a dilemma: share their work and risk it being used to train AI models, or disconnect from the community [13][14] Solutions and Tools - Glaze and Nightshade were developed as tools to protect visual artists by adding modifications to artwork that mislead AI models [16][17] - Glaze protects artistic style, while Nightshade protects individual objects or copyrighted characters [18][19] - These tools have been downloaded over 10 million times by artists from more than 160 countries [21] Regulatory and Future Considerations - The research team testified in front of state legislatures in California and Illinois on data and identity privacy, leading to regulation bills being signed into law [23] - New protective tools are being developed to disrupt deep fake facial models and undressing apps, with other researchers working on protecting audio and music [24] - Protecting creativity and identity is essential for the future, as highlighted by children's concerns [26][27]
数据“中毒”会让AI“自己学坏”
Ke Ji Ri Bao· 2025-08-19 00:18
Core Insights - The article discusses the risks of data poisoning in AI systems, highlighting how malicious interference can lead to incorrect AI learning and potentially dangerous outcomes in various sectors like transportation and healthcare [1][2]. Group 1: Data Poisoning Risks - Data poisoning can occur when misleading data is fed into AI systems, causing them to develop incorrect understandings and make erroneous judgments [1][2]. - A notable example of data poisoning is the case of Microsoft's chatbot Tay, which was forced offline within hours of launch due to being manipulated by users [2]. - The rise of AI web crawlers has led to concerns about the collection of toxic data, which can result in copyright infringement and the spread of false information [3]. Group 2: Copyright and Defensive Measures - Creators are increasingly concerned about their works being used without permission, leading to legal actions like the lawsuit from The New York Times against OpenAI for copyright infringement [4]. - Tools like Glaze and Nightshade have been developed to protect creators' works by introducing subtle alterations that confuse AI models, effectively turning their own creations into "poison" for AI training [4]. - Cloudflare has introduced "AI Maze" to trap AI crawlers in a loop of meaningless data, consuming their resources and time [4]. Group 3: Decentralized Defense Strategies - Researchers are exploring decentralized technologies as a defense against data poisoning, with methods like federated learning allowing models to learn locally without sharing raw data [5][6]. - Blockchain technology is being integrated into AI defense systems to provide traceability and accountability in model updates, enabling the identification of malicious data sources [6]. - The combination of federated learning and blockchain aims to create more resilient AI systems that can alert administrators to potential data poisoning threats [6].