Data centre failures generally have an impact on businesses’ performance and financial results. However, Ed Ansett would argue that as humans become more IT dependent, it is only a matter of time until fatalities occur.
Attending DCD Converged Europe 2015 in London last week, Ed Ansett, chairman of i3 Solutions Group, told the audience that the evolution stage data centres are at "is not very good".
"We still have a long way to go. It is only a matter of time until failure in our industry starts killing people.
"Data centres are bespoke complex home technical systems. Failures are generally non-fatal, unlike other industries such as aviation.
"This will probably change as human dependency on IT increases further. The entire data centre industry is generally speaking unregulated, this will change once people start dying."
Data centre failures are mostly down to humans, who are responsible for six in ten breakdowns including system design, system validation, equipment design, operator error, installation errors and maintenance oversight.
"Machines are only responsible for equipment failure. Natural disasters are an act of God."
Ansett also said that data centre failures are often the result of two or sometimes three simultaneous events.
"Root cause investigation findings are normally secret and bound by NDA. The data centre industry is not learning from its failures.
"The industry is currently nowhere near the upper practical limit of reliability of 100,000 to 200,000 hours."
Looking at recurring data centre failures, Ansett pointed to nine issues that are generally behind data centres outages.
First, generators failing to start are a common issue; batteries and contaminated fuel on the operations side and air lock in fuel on the design are the reasons for this.
Secondly, uncoordinated circuit protection, a design and commissioning problem, are also responsible.
"[On top of this] Loose connections, like the switchgear, UPS battery failure, PLC logic, water leak, standard operations switching errors, maintenance operations errors, and design errors, for example control systems, are all recurring issues."