Outages at the UK data centres of Oracle and Google, triggered by yesterday’s record temperatures, resulted in loss of revenue for a number of small businesses.
The episode reveals the need for resilient IT architectures, experts said, and for more effective data centre cooling systems as global temperatures rise.
Why did the UK heatwave cause Oracle and Google’s data centres to fail?
Temperatures in the South East of England reached 40.3ºC yesterday, the highest temperature ever recorded in the UK, triggering data centre outages at cloud providers Google and Oracle.
At 6pm UK time, Google posted on its Cloud Health Service that there had been an incident with one of its data centre buildings in host zone “Europe-West2-a” that had started at 4:10pm.
According to the incident log, the building had experienced a “cooling-related failure” which had caused a partial failure of capacity. This led to virtual machine terminations and a loss of machines for a “small set of our customers.”
Google powered down the affected zone to prevent damage to the machines. Mitigation work to fix the issue started at 22:08pm with all issues resolved a couple of hours later. (Tech Monitor has contacted Google for more details.)
Oracle also struggled with an outage related to the UK heatwave. At 2:10pm, the tech giant revealed that a subset of Oracle Cloud Infrastructure’s customers experienced a delay in recovering access to their resources hosted in its UK South (London) region.
“Following unseasonably high temperatures in the UK South (London) region, two cooler units in the data centre experienced a failure when they were required to operate above their design limits,” said Oracle in a statement.
“As a result, temperatures in the data centre began to climb causing a subset of Compute infrastructure to go into protective shut down.” The issue was rectified by 11am today.
Businesses impacted by UK heatwave cloud outages
Small businesses told Tech Monitor they had lost out on revenue which they won’t be able to claim back as a result of the outages.
Kelly Mortimer, who is a consultant to the wedding services industry, relies on her website for sales and her paid membership site. When the data centre caused her downtime, hundreds of members couldn’t access the content they had paid for, which could have resulted in reputational damage.
“That’s real money out of the door,” she told Tech Monitor. She confessed that each day her website is down, her company loses the equivalent of 10% of its annual revenue.
Sustainable print-on-demand company Teemill was also affected by Google’s outage. “Google’s UK infrastructure had what they described as a ‘cooling issue’ (we prefer ‘meltdown’) which shut their Europe West data centre,” Sofia Voudouroglou, part of the content team at Teemill told Tech Monitor. “Quite a major event, but then climate change is getting real these days.”
Voudouroglou added that the company was still “assessing the impact” of the downtime, but that they knew “tens of thousands of businesses” they supported were unable to make sales.
Other businesses affected by the outages due include Bitcoin trader VALR, which alerted its customers to the disruption on social media yesterday. “All funds are safe,” it told followers. Full trading was restored in the early hours of this morning.
Agrecalc, which provides carbon emission measurement and mitigation. also alerted customers that its service had been disrupted. “Sincere apologies to our Agrecalc users – there has been a data centre outage which is affecting the delivery of our service,” it posted. Service was resumed later in the day.
Some companies, however, were not so fortunate to be back online quickly. SiteGround, a managed WordPress hosting company, reported that even though the cooling issue had been handled in London overnight, some of its servers failed to start. This caused the company to initiate its disaster recovery from offsite back-ups in Amsterdam.
“After the cooling issue had been handled in London data centre during the night, the servers were put back online but many of them failed to start,” it explained to its followers on Twitter. “That is when we initiated our disaster recovery, while GCP engineers have been working to restore the nodes in London.” It was back up and running by 12:42pm today.
Rising temperatures call for IT resilience
The outages reveal the need for robust IT resilience strategies even when relying on established cloud providers, said Ross Gray, CEO of IT management provider Cloudsoft.
“Organisations simply cannot afford for their systems to suffer any length of downtime,” Gray said. And as more organisations rely on increasingly complex – and therefore more fragile – IT systems, robust resilience strategies are “absolutely vital”.
With extreme temperatures predicted to become more common, Gray said that data centre operators will need to deal with “increased pressure” on cooling systems.
“This is particularly important for those organisations that work within highly regulated sectors,” he explained. “Regulations, such as Digital Operational Resilience (DORA) in the financial sector, mean there can be no amount of downtime, so these firms must be able to cope with future heatwaves and other types of extreme weather.”
Most data centre operators were able to withstand the heatwaves by preparing in the preceding week, said Mai Barakat, data centre infrastructure and services analyst at S&P Global Market Intelligence.
According to analysis by Uptime Institute, one in five organisations has experienced a serious or severe outage, involving significant financial losses, in the past three years.
It also found that 80% of data centre managers and operators have experienced some type of outage over the same period, which is higher than the norm. Of these failures, 60% result in at least $100,000 (£83,480) in total losses, which it found had increased from 2019.
Tech Monitor is hosting the Tech Leaders Club on 15 September. Find out more on NSMG.live