A typo by someone with their head in the cloud: Massive Amazon S3 outage blamed on human error

Amazon Web Services has pointed the finger of blame at human error following the colossal Amazon S3 cloud outage which hit earlier this week.

The company said in a blog that an incorrect command led to the removal of a larger set of servers than intended. That ‘removal’ downed a huge part of the web – CBR included.

READ MORE: Sorry we’re late, AWS cloud disappeared – Top sites knocked offline in huge Amazon Web Services outage

An outage in the company’s Simple Storage Service or Amazon S3 resulted in hampering its clients’ operations for more than three and half hours.

The AWS S3 outage hit its Northern Virginia data centre in the early hours of 28 February, taking websites including Slack, Docker and Soundcloud offline.

Amazon Web Services said in a blog: “The Amazon Simple Storage Service (S3) team was debugging an issue causing the S3 billing system to progress more slowly than expected.

“At 9:37AM PST, an authorized S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process.

The company said that the process of restarting the services and running the required safety checks to validate the integrity of the metadata took longer than expected.

It said: “The servers that were inadvertently removed supported two other S3 subsystems. One of these subsystems, the index subsystem, manages the metadata and location information of all S3 objects in the region. This subsystem is necessary to serve all GET, LIST, PUT, and DELETE requests.”

AWS’ S3 storage system is used by more than half of the company customers for cloud storage. The system stores three to four million pieces of data, according to the estimates made by experts.

Last year, some of the services offered by Microsoft’s cloud service Azure were hit by a two hour long outage.

AWS was not immune to outages in 2016 either, with AWS suffering from significant error rates which impacted Netflix, Tinder and Wink in September.

Sign up for our weekly news round-up!

Sign up to the newsletter: In Brief

READ MORE: Sorry we’re late, AWS cloud disappeared – Top sites knocked offline in huge Amazon Web Services outage

Sign up for our regular news round-up!

Sign up for our weekly news round-up!

Sign up to the newsletter: In Brief

I would also like to subscribe to:

Thank you for subscribing