机器人管理
Search documents
Rust闯大祸了,重写53天后Cloudflare搞出六年来最大失误,ChatGPT、Claude集体失联
3 6 Ke· 2025-11-19 10:08
Core Insights - Cloudflare experienced a significant outage lasting approximately five and a half hours, affecting numerous popular websites and AI services, including OpenAI's ChatGPT and Shopify [1][2][12] - The outage was triggered by an unexpected spike in traffic due to a malfunction in Cloudflare's bot management system, which was not caused by a DDoS attack [9][10][12] - This incident is noted as Cloudflare's most severe outage since 2019, with the company's stock price dropping by about 3% during the downtime [12][13] Incident Details - The outage began around 5:20 AM ET on November 18, when Cloudflare detected abnormal traffic, leading to service disruptions and error messages [2][3] - Cloudflare's internal services were affected, including its CDN and application services, which protect workloads from malicious traffic [2][3] - The company confirmed that the root cause was related to a configuration change that resulted in an oversized threat management configuration file, leading to software failures across multiple services [9][10][12] Recovery Process - Cloudflare's engineering team worked to identify and rectify the issue, with the problem being acknowledged and a fix being implemented by 8:09 AM ET [3][4] - The recovery process involved monitoring and restoring services, with full restoration completed by 11:44 AM ET [3][4] Technical Insights - The malfunction was linked to a specific line of Rust code that was part of a recent rewrite aimed at improving performance and security [4][6] - The bot management module, which assesses incoming traffic, was particularly affected due to a change in the database query behavior that resulted in duplicate entries in the configuration file [10][11] Company Response - Cloudflare acknowledged the severity of the incident and outlined steps to prevent future occurrences, including enhanced validation of internal configuration files and the introduction of global emergency shutdown switches [12][13] - The company expressed regret over the incident, emphasizing the importance of their services and the impact of the outage on customer trust [12][13]
Rust 闯大祸了!重写 53 天后 Cloudflare 搞出六年来最大失误,ChatGPT、Claude 集体失联
AI前线· 2025-11-19 07:00
Core Points - Cloudflare experienced a significant outage lasting approximately five and a half hours, affecting numerous popular websites and AI services, including OpenAI's ChatGPT and Shopify [2][3] - The outage was triggered by a mysterious spike in traffic, which led to errors and increased latency across Cloudflare's services [2][3] - The root cause of the outage was identified as a malfunction in Cloudflare's bot management system, which was exacerbated by a configuration change that resulted in an unexpected increase in the size of a feature file [11][12] Summary by Sections Outage Details - The outage began around 5:20 AM ET on November 18, with Cloudflare first detecting abnormal traffic [2] - By 8:09 AM ET, Cloudflare acknowledged the issue and began implementing fixes, but the recovery process faced challenges [4] - The service disruption was fully resolved by 11:44 AM ET [4] Impact on Services - The outage affected a wide range of services, including X, Spotify, and Truth Social, with DownDetector reporting issues on its own platform [3] - Cloudflare's WARP VPN service and Cloudflare Access tools were also impacted during the incident [3] Technical Insights - The outage was linked to a specific line of Rust code that failed, leading to a significant portion of global internet traffic being disrupted [6][8] - A change in the database system's permissions led to the generation of a feature file that exceeded expected limits, causing the bot management module to fail [11][12] Company Response - Cloudflare admitted that this was its most severe outage since 2019, acknowledging the critical nature of its services [15] - The company outlined steps to strengthen its systems to prevent similar incidents in the future, including enhanced validation of internal configuration files and the addition of global emergency shut-off switches [16] Market Reaction - Following the outage, Cloudflare's stock price dropped by approximately 3% [14]