View all newsletters
Receive our newsletter - data, insights and analysis delivered to you
  1. Technology
  2. Cloud
September 15, 2020

Azure UK South Outage: Overheating Cloud Dried Up Services for Customers

Overheating data centre forces shutdown of all network, compute, and storage resources

By CBR Staff Writer

UK South — one of Microsoft Azure’s two local cloud regions — crashed offline on Monday after an outage triggered by a cooling system failure in a data centre.

The incident, between 14:54 BST on 14 Sep 2020 and 01:41 BST on 15 Sep 2020, left engineers scrambling to place the automated cooling system into manual mode and reset affected pumps, after rising internal temperatures saw systems shut down all network, compute, and storage resources “to protect data durability”.

“Customers using multiple Availability Zones, or Zone Redundant services may have experienced minimal impact” notes Microsoft in its incident report.

The outage dragged on as after manually overriding automated cooling systems and resetting them, engineers had to phase in a return of power and bring infrastructure progressively back online. (A similar incident hit AWS in Japan in 2019).

The outage is the latest in a dismal summer for data centres in the UK, after an August 25th fire in a Telstra data centre in London’s Isle of Dogs and an August 18th outage at Equinix’s prominent LBX LD8 co-location data centre after a UPS failure.

Among those knocked offline were Public Health England which was left unable to update its COVID-19 dashboard during the day as a result.

As Peter Groucutt, managing director of data resilience specialist Databarracks notes: “We are increasingly dependent on a small number of players who dominate the market. Recent events show the challenge of maintaining productivity in outages highlights the importance of external backups.

“Some argue the reason you do not need to back up cloud data is because a data loss is so unlikely. It would be too embarrassing and damaging for Microsoft, Google or AWS if they were unable to recover data for their customers. Unfortunately, there are many examples of data being lost for a small subset of users. If you’re in that small subset, you don’t have a lot of power in the relationship with the cloud provider and if they say your data is unrecoverable, there isn’t much you can do.”

Azure UK South Outage: Company Apologises, to Investigate Further

Microsoft said: “We undertook various workstreams to bring back connectivity. The site engineers placed the cooling system into manual mode and began to reset the affected pumps to recover the cooling plant. This helped to bring temperatures to safe operational ranges in all the impacted areas of the datacenter by 16:40 UTC.

“Once temperatures were within safe thresholds, engineers started to restore power to the affected infrastructure and began a phased approach to bringing this infrastructure back online. Once storage and the networking infrastructure was fully restored, dependent compute scale units began to recover. As compute scale units became healthy, virtual machines and other dependent Azure services recovered.

The company says it will investigate to establish the full root cause and prevent future occurrences” and apologised to customers. The company has come under regular attack for availability issues, with Gartner this month noting in its cloud magic quadrant that “Microsoft has the lowest ratio of availability zones to regions of any vendor in this Magic Quadrant, and a limited set of services support the availability zone model. As a result, Gartner continues to have concerns related to the overall architecture and implementation of Azure, despite resilience-focused engineering efforts and improved service availability metrics during the past year.”

Websites in our network
Select and enter your corporate email address Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
  • CIO
  • CTO
  • CISO
  • CSO
  • CFO
  • CDO
  • CEO
  • Architect Founder
  • MD
  • Director
  • Manager
  • Other
Visit our privacy policy for more information about our services, how Progressive Media Investments may use, process and share your personal data, including information on your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.