Rust闯大祸了,重写53天后Cloudflare搞出六年来最大失误,ChatGPT、Claude集体失联
CloudflareCloudflare(US:NET) 3 6 Ke·2025-11-19 10:08

Core Insights - Cloudflare experienced a significant outage lasting approximately five and a half hours, affecting numerous popular websites and AI services, including OpenAI's ChatGPT and Shopify [1][2][12] - The outage was triggered by an unexpected spike in traffic due to a malfunction in Cloudflare's bot management system, which was not caused by a DDoS attack [9][10][12] - This incident is noted as Cloudflare's most severe outage since 2019, with the company's stock price dropping by about 3% during the downtime [12][13] Incident Details - The outage began around 5:20 AM ET on November 18, when Cloudflare detected abnormal traffic, leading to service disruptions and error messages [2][3] - Cloudflare's internal services were affected, including its CDN and application services, which protect workloads from malicious traffic [2][3] - The company confirmed that the root cause was related to a configuration change that resulted in an oversized threat management configuration file, leading to software failures across multiple services [9][10][12] Recovery Process - Cloudflare's engineering team worked to identify and rectify the issue, with the problem being acknowledged and a fix being implemented by 8:09 AM ET [3][4] - The recovery process involved monitoring and restoring services, with full restoration completed by 11:44 AM ET [3][4] Technical Insights - The malfunction was linked to a specific line of Rust code that was part of a recent rewrite aimed at improving performance and security [4][6] - The bot management module, which assesses incoming traffic, was particularly affected due to a change in the database query behavior that resulted in duplicate entries in the configuration file [10][11] Company Response - Cloudflare acknowledged the severity of the incident and outlined steps to prevent future occurrences, including enhanced validation of internal configuration files and the introduction of global emergency shutdown switches [12][13] - The company expressed regret over the incident, emphasizing the importance of their services and the impact of the outage on customer trust [12][13]