View all newsletters
Receive our newsletter - data, insights and analysis delivered to you
  1. Technology
  2. Cloud
June 10, 2020updated 11 Jun 2020 7:19pm

IBM Blames “Incorrect Routing” by Third Party for Global Cloud Outage

IBM points the finger...

By claudia glover

An IBM Cloud outage that hit 80 data centres globally for well over three hours late Tuesday has been blamed by Big Blue on an unnamed “issue introduced by a 3rd party provider” that it says it fixed by “adjusting routing policies”.

The sweeping outage began on June 9 at 11.00pm and was fixed by June 10, 2.39am, IBM said in an update posted at 12.18pm BST.

Rubbing salt in the wound for the customers, the IBM status page is also served on the IBM cloud and was returning an internal service error for concerned users. (This is a surprisingly common issue that, naturally, means when there is an outage, people can’t learn a great deal about it…)

The data centres impacted…

IBM, when pressed for comment by Computer Business Review, merely told us: ““All IBM Cloud services have been restored”.

We’ll eagerly await the autopsy.

(Quite how how a third party provider managed to knock not just one multi-carrier data centre offline, let alone a global network, remains an open question; some observers have suggested that it may have involved a BGP hijacking or routing mistake by a major carrier).

Updated June 11 09.00: IBM says an “external network provider flooded the IBM Cloud network with incorrect routing, resulting in severe congestion of traffic and impacting IBM Cloud services and our data centers. Mitigation steps have been taken to prevent a reoccurrence. Root cause analysis has not identified any data loss or cybersecurity issues.”

Content from our partners
Why all businesses must democratise data analytics
Unlocking the value of artificial intelligence and machine learning
Behind the priorities of tech and cybersecurity leaders

IBM Cloud promises “global load balancing to ensure a redundant, highly available platform is available for you to host your workloads”.

Read This! IBM Dumps Facial Recognition Software, Warns Over “Mass Surveillance and Racial Profiling”

The outage forced customers to turn to Twitter and IBM Cloud-hosted services for news. Autopilot was among those that piped up to tell customers that IBM had told it the outage “appears to be a networking issue”.

Don’t Leave Before You’ve Read This: Honda Hit by Ransomware: Attack Follows Major 2019 Data Breach

IBM Cloud, which has limited market share compared to the hyperscalers, is mothballing data centres in Dallas, Houston, Seattle and Melbourne this year as part of a modernisation strategy.

It said in a June 9 status update: “We have made significant investments in rolling out new datacenters and Multizone Regions (MZRs) designed to deliver a more resilient architecture with higher levels of network throughput and redundancy. As part of this modernization strategy, we have determined it is necessary to close select older datacenters unsuitable for upgrading.”

Customers will need to migrate workloads to “one of our new IBM Cloud datacenters to avoid service interruptions”.

They’ll also need to cancel old servers after migration. These otherwise will, IBM notes, “continue to be invoiced until cancelled”.

Needless to say, cloud services outages are not uncommon: here are examples from AWS, Azure and GCP.

Fail-overs failing this badly are a rarity, however.

More to follow,

Know more about the outage? Get in touch on claudia dot glover at cbronline dot com

Websites in our network
NEWSLETTER Sign up Tick the boxes of the newsletters you would like to receive. Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
I consent to New Statesman Media Group collecting my details provided via this form in accordance with the Privacy Policy
SUBSCRIBED

THANK YOU