反爬虫机制

Search documents
一分钟3.9万次请求,网站被AI爬虫“碾压”,Meta和OpenAI遭点名,开发者接连祭出神级反爬“武器”
3 6 Ke· 2025-08-22 11:28
Core Viewpoint - The rise of AI crawlers is significantly impacting websites, with major companies like Meta, Google, and OpenAI being identified as the primary culprits behind this issue [1][4][5]. Group 1: AI Crawler Impact - AI crawlers account for 80% of AI robot traffic, with the remaining 20% being fetchers [2][4]. - Major companies dominate the crawler traffic, with Meta holding 52%, Google 23%, and OpenAI 20%, collectively accounting for 95% of the traffic [4]. - The peak traffic from crawlers can reach up to 39,000 requests per minute, causing severe strain on websites [1][13]. Group 2: Real-World Examples - A case study of Trilegangers, a website specializing in 3D models, illustrates the destructive impact of crawlers, leading to the website's collapse due to excessive data scraping by OpenAI [10]. - Fastly's report highlights that even a peak of 1,000 requests per minute can disrupt services for database-dependent sites [13]. Group 3: Developer Responses - Developers are implementing various countermeasures against crawlers, such as the "Anubis" system, which uses proof-of-work to increase the cost of scraping [19]. - Other tactics include "ZIP bombs" that overwhelm crawlers with excessive data and gamified CAPTCHAs that require users to complete challenges to prove they are human [20][21]. - Cloudflare has introduced an AI Labyrinth to mislead crawlers, which has seen over 50 billion requests daily from AI crawlers [24]. Group 4: Future Considerations - The ongoing battle between developers and crawlers is expected to continue, with crawlers evolving to bypass new defenses [26]. - Fastly suggests that smaller websites can use robots.txt to manage crawler traffic and consider deploying advanced systems like Anubis for better control [25].