偷数据的AI公司被抓到了
量子位·2025-08-13 07:02

Core Viewpoint - AI companies are exploiting the Wayback Machine to bypass data access restrictions set by platforms like Reddit, raising concerns about data privacy and ownership [2][4][20]. Group 1: Data Scraping and AI Companies - AI companies have discovered a way to use the Wayback Machine to scrape historical data from Reddit without adhering to the platform's payment and compliance policies [3][16]. - This method allows AI firms to gather large amounts of data necessary for model training while avoiding direct data acquisition agreements [16][20]. - The practice of "curved data scraping" not only infringes on platform rights but also threatens user privacy and the balance of data usage rules [18][20]. Group 2: Reddit's Response - Reddit has implemented strict measures against unauthorized data scraping, including changes to its API policy that led to the shutdown of some third-party applications [8][20]. - The platform has expressed its commitment to protecting user data and privacy, stating that AI companies' scraping activities violate its policies [20]. - Reddit has begun limiting the Wayback Machine's indexing of its content, allowing only the homepage to be accessed while blocking detailed posts, comments, and user profiles [20][22]. Group 3: Broader Implications - The conflict between data ownership, usage boundaries, and AI training needs is escalating, with Reddit not being the only platform affected; others like Facebook and Twitter have also faced similar issues [20][23]. - There are discussions about whether Reddit's actions are a strategy to monetize data through paid access, which some view as a fair exchange [23].

偷数据的AI公司被抓到了 - Reportify