Core Viewpoint - Cloudflare experienced a significant outage on November 18, 2025, lasting approximately five and a half hours, affecting numerous popular websites and AI services, including OpenAI's ChatGPT and Shopify [2][3]. Group 1: Incident Overview - The outage began around 5:20 AM ET, triggered by a mysterious spike in traffic, which Cloudflare identified about an hour and a half later [2][4]. - The incident not only impacted CDN services but also affected Cloudflare's application services, including its zero trust network access tools [3][4]. - By 8:09 AM ET, Cloudflare acknowledged the issue and began implementing fixes, with full service restoration completed by 11:44 AM ET [4][10]. Group 2: Root Cause Analysis - Experts indicated that the outage was not due to a single point of failure but rather a combination of low-probability events, including a database user permission change that led to duplicate data in SQL queries [5][10]. - A significant factor was a malfunction in the bot management system, which was exacerbated by a configuration change that caused a spike in the size of a feature file, leading to software failures across multiple services [10][11]. - The underlying issue was traced back to a change in ClickHouse query behavior, which resulted in the generation of a feature file with excessive entries, ultimately causing HTTP 5xx errors for dependent services [11][12]. Group 3: Impact and Response - The outage was described as Cloudflare's most severe incident since 2019, with the company's stock price dropping approximately 3% during the event [13][14]. - Cloudflare acknowledged the gravity of the situation, emphasizing the need for improved internal checks and emergency shutdown mechanisms to prevent future occurrences [14]. - The incident raised concerns about the internet's reliance on single service providers, highlighting vulnerabilities in the infrastructure [15].
一行 Rust 代码,全球一半流量瘫痪!Cloudflare 用六年最惨宕机,给所有技术人上了一课
程序员的那些事·2025-11-19 11:30