View all newsletters
Receive our newsletter - data, insights and analysis delivered to you
  1. Technology
  2. Cloud
November 11, 2019updated 08 Apr 2020 3:41pm

Google Cloud in Major Global Outage: Numerous Services Fail

"Multiple products are affected globally"

By CBR Staff Writer

Here for April 2020’s outage? We’re covering that here. Want to understand what caused GCP’s last major borkage? We’ve got that covered here

  • Google Cloud Platform (GCP) services down. Issue global in scale. Numerous services affected, including Kubernetes and IoT services like Nest.

Google Cloud Platform (GCP) says it is experiencing a “major issue” with services including Cloud Dataflow, AppEngine, Compute Engine, Cloud Storage, Dataflow, Dataproc, Pub/Sub, BigQuery, Networking all failing today as of 9.14 am BST.

“Multiple products are affected globally” Google Cloud said today.

Engineers are working to mitigate the incident, the company said in a status update. Users of connected home services Nest were among those facing issues.

UPDATED 12.44 BST: “We are investigating an issue with an infrastructure component impacting multiple products. We believe we have identified the cause and are currently rolling out mitigation” GCP said. 

UPDATED 22:00 BST. GCP engineers resolved the issue in approximately 2 hours, 15 minutes. The company says says the issue hit “some Google Cloud APIs across us-east1, us-east4 and southamerica-east1, with some APIs impacted globally. This includes the APIs for Compute Engine, Cloud Storage, BigQuery, Dataflow, Dataproc, and Pub/Sub. App Engine applications in those regions [were] also impacted.”

Google Cloud Down

The issue comes 21 days after users faced 100 percent packet loss to and from ~20 percent of instances in GCP’s us-west1-b region for two-and-a-half hours.

That outage was blamed on failure in the underlying leader election system” (its “Chubby lock system”) which “resulted in components in the control plane losing and gaining leadership in short succession.”

More Details: Google Cloud’s Little “Chubby” Outage

The issue follows a string of public cloud outages; a reminder that even the best resourced IaaS companies are not immune to development and infrastructure borkage.

AWS, Azure and GCP have all suffered high profile incidents in the past six months, with AWS services interrupted by a DDoS attack for eight hours on October 22, the same day that GCP suffered its US west coast issue.

Azure has also struggled with a string of well documented outages, with an overloaded Redis cache triggering a 17-hour multi-factor authentification outage in November and Office 365 failing in January; something Microsoft blamed on a “subset of mailbox database infrastructure [that] became degraded, causing impact”.

Read this: IaaS Magic Quadrant: Gartner Gets the Claws Out





Websites in our network
Select and enter your corporate email address Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
  • CIO
  • CTO
  • CISO
  • CSO
  • CFO
  • CDO
  • CEO
  • Architect Founder
  • MD
  • Director
  • Manager
  • Other
Visit our privacy policy for more information about our services, how New Statesman Media Group may use, process and share your personal data, including information on your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.