View all newsletters
Receive our newsletter - data, insights and analysis delivered to you
  1. Technology
  2. Software
February 1, 2017

Major GitLab backup failure wipes 300GB of data

GitLab faces backup failure after accidentally deleting data.

By Hannah Williams

GitLab has currently been taken offline after suffering a major backup restoration failure following an incident of accidental data deletion.

The source-code hub released a series of tweets following the incident, one of which confirms the failure: “We accidentally deleted production data and might have to restore from backup.” This included a link to a Google Doc file with live notes.

The data loss took place when a system administrator accidentally deleted a directory on the wrong server during a database replication process. A folder containing 300GB of live production data was completely wiped.

GitLab said: “This incident affected the database (including issues and merge requests) but not the git repos (repositories and wikis).”

It was identified that out of the 5 backup techniques deployed, none had either not been working reliably or set up in the first place. The last potentially useful backup was taken six hours before the issue occurred.

However, this is not seen to be of any help as snapshots are normally taken every 24 hours and the data loss occurred six hours after the previous snapshot which results to six hours of data loss.

David Mytton, founder and CEO, Server Density said: “This unfortunate incident at GitLab highlights the urgent need for businesses to review and refresh their backup and incident handling processes to ensure data loss is recoverable, and teams know how to handle the procedure.

Content from our partners
Scan and deliver
GenAI cybersecurity: "A super-human analyst, with a brain the size of a planet."
Cloud, AI, and cyber security – highlights from DTX Manchester

“This particular accident shows that any business, no matter how technical or experienced in data management, can become a victim of accidental and catastrophic data loss.”

Mistakes made by the company leading to the backup restoration failure include the fact that disk snapshots in Azure are normally enable for the NFS server, but not the DB servers that were used by GitLab.

GitLab said that within their efforts to restore the data, it was noticed that the replication procedure was very fragile, prone to error, relies on a handful of random shell scripts and is badly documented. This then brought about the realisation that all backups to S3 were also unsuccessful.

Overall, GitLab has confirmed that the disruption only affects the website and all customers who use the platform on premise will not be affected.

Topics in this article : , , , ,
Websites in our network
Select and enter your corporate email address Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
  • CIO
  • CTO
  • CISO
  • CSO
  • CFO
  • CDO
  • CEO
  • Architect Founder
  • MD
  • Director
  • Manager
  • Other
Visit our privacy policy for more information about our services, how Progressive Media Investments may use, process and share your personal data, including information on your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.