Glaze
Search documents
机器学习中的数据投毒:人们为何以及如何操纵训练数据
3 6 Ke· 2026-01-19 01:56
简而言之,数据投毒是指以某种方式改变用于构建机器学习模型的训练数据,从而改变模型的行为。这 种影响仅限于训练过程,一旦模型被篡改,损害就无法挽回。模型将出现不可逆转的偏差,甚至可能完 全失效,唯一的真正解决办法是使用干净的数据重新训练模型。 你知道你的数据都去了哪里吗? 数据是机器学习乃至人工智能运行不可或缺的重要组成部分,尽管它有时会被忽视。生成式人工智能公 司正在全球范围内搜寻更多数据,因为构建模型需要大量的原始数据。任何构建或调整模型的人都必须 首先收集大量数据才能开始。 然而,这种现实也带来了一些相互冲突的激励机制。保护数据的质量和真实性是安全的重要组成部分, 因为这些原始数据将决定您提供给用户或客户的机器学习模型的成败。不法分子可以策略性地在您的数 据集中插入、修改或删除数据,而您可能根本察觉不到这些操作,但这些操作会系统性地改变模型的行 为。 与此同时,艺术家、音乐家和作家等创作者正与猖獗的版权侵权和知识产权盗窃行为进行着一场旷日持 久的斗争,而这些侵权行为主要来自那些需要更多数据来填充其庞大训练过程的生成式人工智能公司。 这些创作者正在寻求能够阻止或遏制这种盗窃行为的措施,而不是仅仅依赖往往行动 ...
How artists can protect their work from AI | Dr. Heather Zheng | TEDxChicago
TEDx Talks· 2025-10-28 17:00
Generative AI Impact on Identity - Generative AI models pose a serious threat to human identity by enabling the creation of fake versions of individuals using their data [6] - AI models can mimic voices with short audio recordings, leading to misuse, as seen with celebrities like Tom Hanks and Scarlett Johansson [7] - Scammers are using AI to generate fake faces and voices of loved ones to deceive seniors, and harmful content like undressing apps is being created [8] Impact on Creative Professionals - Generative AI models are trained to mimic the creations of artists, impacting their identity and livelihoods [11] - Artists are losing jobs and income due to AI-generated mimics overshadowing their work, leading some to quit or reconsider their careers [12] - Artists face a dilemma: share their work and risk it being used to train AI models, or disconnect from the community [13][14] Solutions and Tools - Glaze and Nightshade were developed as tools to protect visual artists by adding modifications to artwork that mislead AI models [16][17] - Glaze protects artistic style, while Nightshade protects individual objects or copyrighted characters [18][19] - These tools have been downloaded over 10 million times by artists from more than 160 countries [21] Regulatory and Future Considerations - The research team testified in front of state legislatures in California and Illinois on data and identity privacy, leading to regulation bills being signed into law [23] - New protective tools are being developed to disrupt deep fake facial models and undressing apps, with other researchers working on protecting audio and music [24] - Protecting creativity and identity is essential for the future, as highlighted by children's concerns [26][27]
数据“中毒”会让AI“自己学坏”
Ke Ji Ri Bao· 2025-08-19 00:18
Core Insights - The article discusses the risks of data poisoning in AI systems, highlighting how malicious interference can lead to incorrect AI learning and potentially dangerous outcomes in various sectors like transportation and healthcare [1][2]. Group 1: Data Poisoning Risks - Data poisoning can occur when misleading data is fed into AI systems, causing them to develop incorrect understandings and make erroneous judgments [1][2]. - A notable example of data poisoning is the case of Microsoft's chatbot Tay, which was forced offline within hours of launch due to being manipulated by users [2]. - The rise of AI web crawlers has led to concerns about the collection of toxic data, which can result in copyright infringement and the spread of false information [3]. Group 2: Copyright and Defensive Measures - Creators are increasingly concerned about their works being used without permission, leading to legal actions like the lawsuit from The New York Times against OpenAI for copyright infringement [4]. - Tools like Glaze and Nightshade have been developed to protect creators' works by introducing subtle alterations that confuse AI models, effectively turning their own creations into "poison" for AI training [4]. - Cloudflare has introduced "AI Maze" to trap AI crawlers in a loop of meaningless data, consuming their resources and time [4]. Group 3: Decentralized Defense Strategies - Researchers are exploring decentralized technologies as a defense against data poisoning, with methods like federated learning allowing models to learn locally without sharing raw data [5][6]. - Blockchain technology is being integrated into AI defense systems to provide traceability and accountability in model updates, enabling the identification of malicious data sources [6]. - The combination of federated learning and blockchain aims to create more resilient AI systems that can alert administrators to potential data poisoning threats [6].