Workflow
NeMo Megatron框架
icon
Search documents
说过 ≠ 做过:英伟达否认用盗版书训练 AI,强硬要求法院驳回作家集体诉讼
Xin Lang Cai Jing· 2026-02-09 01:24
Core Viewpoint - Nvidia is facing a lawsuit for allegedly using copyrighted books to train its AI models, which the company denies, claiming the accusations are speculative and lack substantial evidence [1][2]. Group 1: Lawsuit Details - The lawsuit, titled Nazemian v Nvidia, was initiated by a group of authors in early 2024 and is currently being heard by Judge Jon Tigar in the Northern District of California [1]. - The plaintiffs allege that Nvidia's AI tools utilized copyrighted books from sources like "shadow libraries," including Anna's Archive and Books3, during the training process [1]. - Nvidia submitted a motion on January 29, 2024, to dismiss the lawsuit, arguing that the plaintiffs failed to provide concrete evidence that their works were downloaded or used for model training [2]. Group 2: Nvidia's Defense - Nvidia contends that the plaintiffs have not met the basic requirements for a copyright infringement lawsuit, as they did not specify how, when, or which models allegedly contained the copyrighted works [2]. - The company argues that discussions about potential data sources do not equate to actual usage or copyright infringement, emphasizing that the plaintiffs' claims are based on conjecture [2][3]. - Nvidia criticized the plaintiffs for relying heavily on statements based on "information and belief," which the company argues is insufficient for establishing infringement facts at the pleading stage [2]. Group 3: Additional Allegations and Responses - The revised complaint introduced new allegations regarding multiple datasets and models, including discussions about Megatron 345M, which Nvidia argues lack specific explanations on how the plaintiffs' works were used [3]. - The plaintiffs also proposed an "indirect liability" theory linking Nvidia's NeMo Megatron framework to the ability to download public datasets like The Pile, but Nvidia countered that the complaint does not allege any direct infringement by third parties, which is necessary for establishing liability [5]. - Nvidia maintains that merely providing optional tools does not automatically incur liability unless the plaintiffs can demonstrate that users actually used these tools to commit copyright infringement [5].
英伟达否认用盗版书训练AI,要求法院驳回相关诉讼
Sou Hu Cai Jing· 2026-02-08 15:36
Core Viewpoint - Nvidia is facing a lawsuit for allegedly using pirated books to train its AI models, which the company denies, claiming the accusations are speculative and lack substantial evidence [1][2]. Group 1: Lawsuit Details - The lawsuit, titled Nazemian v Nvidia, was initiated by a group of authors in early 2024 and is currently being heard by Judge Jon Tigar in the Northern District of California [1]. - The plaintiffs allege that Nvidia's AI tools and reference models utilized copyrighted books from sources like "shadow libraries," including Anna's Archive and Books3 [1]. - Nvidia submitted a motion on January 29, 2024, to dismiss the lawsuit, arguing that the plaintiffs failed to provide concrete evidence that their works were downloaded or used in model training [2]. Group 2: Nvidia's Defense - Nvidia contends that the plaintiffs have not met the basic requirements for a copyright infringement lawsuit, lacking specific facts about the alleged copying of their works [2]. - The company emphasizes that discussions about potential data sources do not equate to actual usage or copyright infringement, asserting that the plaintiffs' claims are based on conjecture [2][3]. - Nvidia criticizes the plaintiffs for relying heavily on statements based on "information and belief," which it argues is insufficient for establishing infringement facts at the pleading stage [2]. Group 3: Additional Allegations - The revised complaint includes new allegations regarding multiple datasets and models, which Nvidia seeks to narrow down, arguing that the plaintiffs have not explained how specific models used their works for training [3]. - Nvidia also addresses a new "indirect liability" theory in the revised complaint, asserting that the plaintiffs have not identified any third-party direct infringement, which is necessary for establishing contributory liability [4]. - The motion to dismiss is scheduled for a hearing on April 2, 2026, in the Northern District of California [4].
英伟达被起诉,用盗版训练大模型成行业潜规则?
Xin Lang Cai Jing· 2026-02-08 09:51
Core Viewpoint - Nvidia is facing a collective lawsuit regarding copyright infringement related to the use of data from "shadow libraries" for training its AI models, specifically the NeMo Megatron framework, which allegedly includes copyrighted works without permission [3][18]. Group 1: Lawsuit Details - The lawsuit was filed by five authors who claim Nvidia used a dataset from illegal "shadow libraries" to develop its next-generation language model [3][18]. - Nvidia submitted a motion on January 31, 2026, arguing that the plaintiffs failed to provide sufficient evidence of infringement and asserting that its actions fall under "fair use" [4][18]. - A hearing is scheduled for April 2, 2026, to review Nvidia's motion [4]. Group 2: Competitive Pressure - Internal records indicate that Nvidia faced competitive pressure from OpenAI, prompting it to acquire millions of pirated books from shadow libraries to showcase its technology at the 2023 developer conference [19][20]. - The lawsuit highlights that Nvidia provided tools and scripts to clients to facilitate the downloading of pirated datasets [19]. Group 3: Data Sources - Nvidia's NeMo Megatron models were reportedly trained on The Pile dataset, which includes a subset called Books3 sourced from the shadow library Bibliotik, containing approximately 190,000 books [21][22]. - Nvidia is accused of directly collaborating with the largest shadow library, Anna's Archive, to access millions of pirated books, totaling around 500TB of data [24][22]. Group 4: Industry Context - The rise of AI has led to increased litigation over training data copyright issues, with other companies like OpenAI, Anthropic, and Meta also facing similar lawsuits [20][28]. - The competitive landscape has intensified, with Nvidia's need for high-quality training data driving it to engage with shadow libraries, which offer easier access to vast amounts of data [21][27]. Group 5: Legal Precedents - Previous cases have seen significant settlements, such as Anthropic agreeing to pay at least $1.5 billion to settle a copyright infringement lawsuit, potentially setting a record for copyright damages [20][28]. - Courts have ruled on the fair use of copyrighted works for AI training, with some cases determining that using such works can be considered fair use under certain conditions [29][30].