Core Points - Google experienced a major outage affecting over 70 cloud services globally, disrupting third-party services such as Cloudflare, OpenAI, and Shopify, as well as first-party products like Gmail and Google Drive [1][2] - The outage was attributed to multiple layers of flawed updates, specifically a new feature in the "quota policy checks" that was not adequately tested in real-world scenarios, leading to improper data handling [2][3] - The incident lasted for seven hours, although engineers identified the issue within 10 minutes, resulting in overloads in larger regions due to the lack of feature flags during the rollout [3] Company Response - Google issued an apology for the outage, acknowledging the impact on customers and their users, and committed to making improvements to prevent future occurrences [2] - The company plans to change its architecture to ensure that if one system fails, others can continue to operate without crashing [4] - Google will conduct audits of all systems and enhance communication protocols to provide timely information to customers during issues [4]
Google issues apology, incident report for hours-long cloud outage