Cloud resilience: Automating disaster recovery

Cloud. (Photo by PeachShutterStock/Shutterstock)

One of the main pressures faced by businesses today is balancing regulatory compliance with operational efficiency. Rules are there to protect consumers, partners and wholesalers, and to ensure fair practice among competitors, but they can also expose businesses to significant risks if they make mistakes and slip into practices that contravene strict parameters.

A good example of resilience sometimes coming at the expense of simplicity is the multi-cloud environment model. While a multi-cloud approach can be beneficial due to not having to depend entirely on one provider, businesses still need to be able to demonstrate that they can failover in the event of an emergency such as a data breach, which can be a very complex process, particularly if you have complex architecture.

“Sometimes companies fall into the trap of asking themselves how likely an event is to happen,” says Kieran Gutteridge, CTO of Cutover. “And while the likelihood is important to consider, it doesn’t preclude plausibility.

“Businesses also need to consider the fact that consumers have become more demanding in recent years. Bolstered partly by free providers like Google, Facebook, Microsoft Teams and Zoom, customers know that they can easily switch between software platforms if there’s an outage or a disruption to service. If you don’t have a failover and cannot fix the problem quickly, consumers know the market and won’t hesitate to move on.”

21 steps to executing an IT disaster recovery plan that actually works

In a time of inevitable power outages, cyberattacks, global crises, and complex technology estates, organizations must fortify their technology by developing solid IT disaster recovery plans.

Perhaps an even bigger problem for businesses is that, according to a recent Cutover survey, almost three-quarters of company decision-makers assumed that resilience was already a built-in feature of the cloud, while only one-fifth believed that they had a robust cloud resilience strategy in place.

In other words, up to now education around the subject has been sorely lacking. Gutteridge puts this down to the false sense of security that decision-makers have lulled themselves into in the cloud era and that, because data storage no longer requires physical maintenance of on-prem servers, it has been easy for businesses to adopt an ‘out of sight, out of mind’ approach to resilience.

“For individual applications, there’s quite a lot you can do to make it more resilient in the cloud,” Gutteridge says. “Taking advantage of availability zones, for example, taking advantage of regions, thinking in a detailed way about your service.

“Say you only need to failover parts of your service – a queueing system, for instance – it’s surely better to move just the queue to prevent it from going down rather than the whole application. By going deep, you can make things a lot more resilient from the inside out, brick by brick.”

Putting a cloud resilience strategy in place

For businesses where such processes are not yet automated, according to Gutteridge, getting to the point of total cloud resilience can seem like an endless uphill battle. Companies that are moving from on-prem to cloud storage need to know exactly what processes are in place for their on-prem applications, then adapt these accordingly for the cloud according to each application’s specifications.

“Cutover basically shines a light on what’s achievable in the first instance,” Gutteridge says. “It’s about first helping to identify where it is most crucial to focus automation efforts. Customers typically run thousands of applications at any one time, and, even if they had a magic wand, they wouldn’t be able to automate everything overnight.

“Once you’ve started the automation journey, Cutover helps to set things up in terms of strategising, giving advice on scalability, and setting out goals specific to your business so that everything isn’t happening at once and ends up overwhelming your existing infrastructure.

For this to happen successfully, Gutteridge says, stakeholders across the full breadth of the organisation need to be engaged in the process, whether they’re involved in the engineering or not; learning the importance of cloud resilience and how it has the potential to affect every aspect of the business. Using the scenario of a data centre going down, Gutteridge says it’s important to have robust service recovery plans in place and ready to go. A constant state of readiness is vital.

“In Cutover, you can simulate an event like this very quickly, make a plan for taking that data centre down, and then get to the bottom of what applications are likely to have been affected. Cutover delivers a report based on such a simulation, then moves to kick off the service recovery plan held in the system so that it can instantly restart and notify the right people.

“If you extend this gradually over every application, suddenly it shines a light on where the areas of weakness are. You can then drill down on those applications, identify any interdependencies, and install a kind of command and control function to allocate resources to areas of high priority.”

The proliferation of new applications, multi-cloud architectures, and shifting regulatory requirements may make the process of instigating, testing and maintaining operational cloud resiliency seem an immense and continuous challenge, one akin to building an aircraft already in flight.

However, through engaging the right partners, involving all stakeholders, and leveraging the power of collaborative automation, businesses can confidently optimise resilience processes, preparing themselves for any and all eventualities.

To learn more, watch Kieran Gutteridge discuss how best to automate your IT cloud resilience process.