Workflow
AI训练数据版权
icon
Search documents
OpenAI等六大AI巨头遭作家起诉
3 6 Ke· 2025-12-23 11:56
当地时间12月22日,两届普利策新闻奖得主约翰·卡雷鲁牵头的作家群体,向美国加州北区地方法院提 起集体诉讼,将OpenAI、谷歌、Meta、Anthropic、xAI及Perplexity AI六家AI巨头列为共同被告,指控 其通过盗版书籍训练模型构成"蓄意侵权"。卡雷鲁曾揭露了硅谷血液检测创业公司Theranos的惊天骗 局,并据此出版书籍《滴血成金》。 诉状显示,原告核心指控集中于"双重侵权链条":六公司从LibGen、Z-Library等"非法影子图书馆"批量 下载数百万册小说、纪实作品等盗版书籍,再将这些著作用于大语言模型训练与产品优化,形成"盗版 获取-模型训练-商业变现" 的非法闭环。原告方强调,作家的智力成果支撑起"价值数十亿美元的AI生 态",却未获分文补偿。 若陪审团认定侵权属故意行为,每部侵权作品最高可获赔15万美元。 此次诉讼并非AI公司首次卷入文字作品侵权纠纷。据南都数字经济治理研究中心报告,OpenAI是行 业"被诉大户",已面临至少14起版权诉讼。 早在2023年底,《纽约时报》就侵犯版权起诉微软和OpenAI,称报纸发表的数百万篇文章被用于训练 智能聊天机器人(如微软Copilo ...
新研究为OpenAI版权争议添实据:训练数据“记忆”追踪技术或成诉讼关键
Huan Qiu Wang· 2025-04-06 02:36
Core Insights - A joint study by Washington University, Copenhagen University, and Stanford University provides new evidence regarding OpenAI's alleged unauthorized use of copyrighted content to train its AI models [1][3] - The research introduces an innovative method to identify the training data sources of AI models that provide services via API, potentially escalating legal disputes between OpenAI and copyright holders [1][3] Group 1: Research Findings - The research team developed a technology that analyzes specific patterns in AI-generated content to trace back the sources of its training data [3] - This method can detect whether models like OpenAI's have "memorized" unique segments from copyrighted works, overcoming limitations of traditional copyright detection techniques [3] - The findings provide copyright holders with a new legal tool to more accurately demonstrate infringement by OpenAI's models [3] Group 2: Legal Implications - Since 2023, OpenAI has faced multiple class-action lawsuits from copyright holders, including writers and programmers, accusing the company of using copyrighted works without permission for training its AI models [3] - OpenAI has defended itself by citing the "fair use" principle, but plaintiffs argue that there are no exemptions in U.S. copyright law for AI training data [3] - The study's results pose a significant challenge to OpenAI's defense, as copyright holders may leverage this technology to prove direct use of their works in training [3] Group 3: Industry Impact - The research team emphasizes that the technology is not intended for "fishing enforcement," but rather to provide objective evidence in copyright disputes [4] - The potential widespread adoption of this technology could lead to increased transparency regarding the sources of training data for AI companies like OpenAI [4] - This shift may disrupt the existing compliance framework for AI training, as companies have long relied on vast amounts of data for model training [4]