Gandi Outage: Hardware Failure Destroys TBs of Customer Data

credit: pawel janiak, unsplash, creative commons

French cloud services and domain name registrar Gandi.net says it has lost several TB of customer data after a hardware issue at one of its data centres.

The company today came under sustained fire today for telling irate customers that responsibility for creating back-ups lay with them.

In updated first posted on Wednesday, January 8, 2020 at 2:54 pm, the company said: “An incident on a filer is affecting PAAS/IAAS at LU-BI1.” (i.e. Platform-as-a-Service and Infrastructure-as-a-Service at a Luxembourg data centre).

In later tweets by the team at its network operations centre (NOC) Gandi said there was a “hardware issue involved”. As Computer Business Review published, it appeared to have given up hope of restoring files and customer databases.

Gandi Outage: “Several TB of Data on the Filer”

Gandi.net (owned by France-registered Gandi SAS) manages over 2.4 million domain names from 192 countries, making it the sixth largest such service provider in Europe, and in the top fifteen worldwide. It also offers its own cloud service, letting users spin up servers from its data centres, from £4.73 monthly for one CPU.

It was not immediately clear how many customers were affected.

In a status update at 5.35 pm, Gandi said: “The data is considered lost at this point, but we are still doing all we can to recover it. We’ll be keeping you up to date on the process, but be aware it will take over 24 hours until we can tell for sure.”

The company added to customers: “The assessment is taking a long time because there are several TB of data on the filer.”

OK, you succeed well in the hard task to loose your customers data and snapshoots for good. Is it at least possible reuse the IPv4 addresses assigned on LU-BI1 for create a server clone on another datacenter? It will helps a lot. @gandibar @gandinoc

— Andrea Ganduglia (@andreaganduglia) January 9, 2020

The company had earlier doubled down on its claims that customers should have created their own back-ups, with one staffer telling frustrated users on twitter: “Which one of us do you want naked?”

Storage can fail for numerous reasons. November 2019, HPE, for example, pushed out firmware updates for a range of its SSDs in a bid to prevent their failure at 32,768 hours of operation time owing to a drive firmware bug.

The devices are used in multiple server and storage products.

Updated 09:00, January 10, 2019

Gandi ops manager Sébastien Dupas told Computer Business Review: “We do not use HPE SSD, so we are not exposed to their firmware bug.

“Our outage is not directly linked to an hardware outage.

“We have not yet identified the root cause of it. We have some leads but it’s more related to a software issue. We have had metadata corruption on the disks following a random hardware crash. It should not have happen [sic], we will publish a full postmortem on our site news.gandi.net.”

The company later said it has been able to recover “a version of the filesystem from right before the crash” and is working on copying what data it can recover to another storage unit before taking any further steps.

Should you should expect your hosting company to replicate and backup their datacenters by default? Were you affected?

Let us know your thoughts.