Nvidia-英伟达涉版权侵权诉讼，被指从影子图书馆获取 500TB 盗版数据

Core Viewpoint - Nvidia is facing a significant copyright class-action lawsuit from multiple authors, accusing the company of intentionally obtaining vast amounts of pirated data from various "shadow libraries" to train its AI models, with the involved data size reaching 500TB and containing millions of copyrighted books [1][3]. Group 1: Legal Allegations - The lawsuit claims that Nvidia used the Books3 dataset, which includes pirated works, for training its AI models, despite Nvidia's defense arguing that this usage falls under "fair use" [3]. - New evidence revealed in the lawsuit indicates that Nvidia's data strategy team sought to acquire millions of pirated materials from the "Anna Archive," a controversial shadow library, and proceeded with the collaboration despite knowing the data's illegal status [3][4]. - The lawsuit has expanded to include claims of contributory and vicarious infringement, asserting that Nvidia facilitated the acquisition of pirated datasets for its clients [4]. Group 2: Company Operations and Market Impact - Nvidia has been a key beneficiary in the AI boom, with its revenue significantly increasing due to the demand for AI training chips and data center services [1]. - The company has actively developed its own AI models, such as NeMo and Retro-48B, which rely on extensive text data for training, raising questions about the legality of its data sourcing methods [1]. - The ongoing legal issues with the "Anna Archive" and other platforms like LibGen, Sci-Hub, and Z-Library have heightened public scrutiny of Nvidia's data acquisition practices [4].