UPDATED 17.50 BST, April 8, 2020. Google Cloud says it has resolved all issues within a healthy 90 minutes of initial reports emerging.
Google Cloud is facing a flurry of alerts from worried users unable to access a range of services, after suffering what appears to be a major outage.
The issue started at 07:35am US/Pacific Time (15.35 BST).
Google Cloud Platform (GCP) blamed the issue on Cloud IAM (Identity and Access Management) errors but said it has mitigated most issues.
“Impact is now believed to be limited more directly to use of the IAM API.”
What’s affected: GCP said its App Engine, Dataproc, Cloud Logging, Firebase Console, Cloud Build, Cloud Pub/Sub, BigQuery, Compute Engine, Cloud Tasks, Cloud Memorystore, Firebase Test Lab, Firebase Hosting, Cloud Networking, Cloud Data Fusion, Cloud Kubernetes Engine, Cloud Composer, Cloud SQL, and Firebase Realtime Database are all likely to be experiencing elevated error rates.
The issue appears to have knocked GCP user Snapchat offline, among other customers, with #Snapchatdown trending on Twitter as the incident happened.
Workaround: Customers may continue to file cases using https://support.google.com/cloud/contact/prod_issue or via phone
Hey there, we're currently experiencing a service disruption. We hope to have it cleared soon. -LH
— Google Cloud (@googlecloud) April 8, 2020
Worried customers meanwhile are chasing frozen projects.
— michyliao (@michyliao) April 8, 2020
Computer Business Review first noted the story after a surge in traffic to a story about a previous major Google Cloud outage in November 2019.
That was caused by a “failure in the underlying leader election system” which “resulted in components in the control plane losing and gaining leadership in short succession” in what looked like a major architectural howler from Google Cloud.
These frequent leadership changes halted network programming, preventing VM instances from being created or modified”.
Users are taking to Google’s Twitter page seeking help in some numbers.
GCP, of course, is not alone in having period service provision issues.
Azure has throttled bandwidth for backups and locked out users on free/student accounts as it faces a data centre capacity crunch. AWS appears to be having a largely untroubled pandemic thus far, but has retained a deafening silence about having services knocked down by a mystery DDoS attack in October 2019.
Azure also faced IAM/MFA-related issues late last year lasting 17 hours. That was ultimately blamed on an overloaded Redis cache and fixed by a hard restart.