In December 2018, the O2 mobile network shut down across the UK, impacting 32 million users and costing an estimated $100 million – why? Because of an unidentifiable expired SSL/TLS certificate, writes Martin Thorpe, Enterprise Architect at Venafi.
O2 is by no means alone; a recent survey found that almost two-thirds of organisations (60 percent) experienced a certificate-related outage in the past year that impacted critical business applications or services, with 74 percent saying they had suffered one in the past 24 months. Yet despite the regularity and potential severity of the problem, there is still a lack of understanding of why and how such outages occur, or how to protect against them.
How Machine Identities are Used to Secure Our Online World
To understand why certificate-related outages happen and how to prevent them, it’s important to understand why businesses use certificates in the first place. Transport Layer Security (TLS) certificates – or Secure Socket Layer (SSL) as they were known previously, provide ‘machines’ – which can be anything from an IoT device through to a server or piece of software – with a unique ‘identity’. These identities enable machines to use encrypted connections and establish trust in most of digital transactions, so that they can be performed securely. They validate the identity of both communicating machines and are the gatekeepers in authenticating secure communications, using SSL/TLS to verify program-to-program interactions, machine-to-machine communications and digital signatures.
Certificates are issued with a validity period – the time limit on how long the certificate can be trusted for. The validity period will vary and can be anything from a few weeks to ten years, although two years is considered to be the best practice. Once this time is up the certificate becomes invalid unless it is renewed or replaced. Failure to renew or replace it means that any communication to that machine will cease to work. Knowing where each certificate is installed, who controls access to that machine, and when the certificate will expire is essential to business continuity. However, many businesses do not have the visibility and intelligence required to give them this information so they struggle to keep up with their certificate replacements, leading to unplanned outages.
The Machine Identity Challenge
Many businesses simply don’t know where their machine identities are installed, let alone who the employee in charge of them is, or when the certificates are due to expire. In our hyper-connected digital world, there can be hundreds of thousands of certificates to manage, which are being created daily by a range of different teams across the organisation. When an employee responsible for a certificate leaves the business, it’s not always clear where that certificate was used, or who now has access to the machine. Most importantly, it’s not clear who’ll be responsible for renewing the certificate when it’s due to expire. This points to a wider lack of ownership within the organisation; nobody is sure who should be overseeing certificate management to ensure that services stay up and running.
This complexity is set to increase significantly. 80 percent of CIOs estimate certificate use in their organisations will grow by 25percent or more in the next five years; with more than half anticipating minimum growth rates of more than 50 percent. Without central visibility, ownership and control, if an outage occurs, then IT departments face a race against the clock to try and locate and replace the lost machine identity. Every minute an outage continues, the more damage is done to the organisation’s reputation, revenue, customer trust and operational productivity.
What Can Be Done?
Organisations need to be proactive, rather than reactive, when it comes to managing the lifecycles of their machine identities. Companies should not rely on disgruntled customers to make them aware of disruptions to services and should instead find ways to proactively prevent certificate outages before they start to impact frontline services and customers. Equally, if an outage does occur, then they need to have the information required to find and replace the offending certificates as soon as possible to limit the damage. Otherwise, it’s simply a matter of time until one expires and causes a debilitating outage.
Proactively identifying certificates with crucial data such as where it was used and who will own it, not to mention their upcoming expiry dates is one of the main protections against SSL/TLS outages, yet to do this, organisations need to move away from their reliance on spreadsheets and other basic tools. IT and security teams need greater visibility, intelligence and automation of the entire lifecycle of every single machine identity their organisation uses in order to protect against outage.
Having a centralised and automated platform can help to ensure business continuity and safeguard against certificate outages occurring. This removes the burden of worrying about when to renew or replace certificates from IT and security teams, streamlining the certificate renewal process and eliminating the risk to the business that certificate outages present. This level of oversight and control is essential to ensure secure transactions, private communications and most importantly, to avoid becoming the latest company to suffer serious reputational damage at the hands of a certificate outage.