View all newsletters
Receive our newsletter - data, insights and analysis delivered to you
  1. Technology
  2. Data Centre
January 3, 2017updated 13 Jan 2017 11:39am

Leap second causes ‘panic’ for Cloudflare servers

People left unable to access Internet sites.

By

The leap second that was added to the end of 2016 caught out Cloudflare causing some of its servers to fail.

The web firm which says, “we make the Internet work the way it should”, offers CDN, DNS, DDoS protection and security but found that some of its servers failed to handle the added second.

The result was that users received an error message to say that servers could not be reached instead of seeing the page that they wanted to visit.

Cloudflare said that it fixed the problem within 90 minutes and explained the problem by saying: “At midnight UTC on New Year’s Day, deep inside Cloudflare’s custom RRDNS software, a number went negative when it should always have been, at worst, zero.

“A little later this negative value caused RRDNS to panic. This panic was caught using the recover feature of the Go language. The net effect was that some DNS resolutions to some Cloudflare managed web properties failed.”

Servers were unable to handle the leap second.

Servers were unable to handle the leap second.

Cloudflare customers use the company’s DNS service to serve the authoritative answers for their domains. Basically the company is a go-between for websites that are aiming to speed up access to a site while also stopping malicious traffic.

The problem is said to have affected about 1% of the requests its servers process during the glitch.

Content from our partners
Why all businesses must democratise data analytics
How start-ups can take the next step towards scaling up
Unlocking the value of artificial intelligence and machine learning

Analysis of the problem revealed that a mismatch between the time-stamps Cloudflare servers were expecting and the ones they got caused the system to ‘panic’.

The trigger for the issue was the addition of the leap second that was added to the end of 2016. This was added in order to compensate for a slowdown in the earth’s rotation and is designed to help co-ordinate time-keeping for those nations that use Greenwich Mean Time (GMT).

Cloudflare said: “This problem was quickly identified. The most affected machines were patched in 90 minutes and the fix was rolled out worldwide by 0645 UTC. We are sorry that our customers were affected, but we thought it was worth writing up the root cause for others to understand.”

Topics in this article: , , ,
Websites in our network
NEWSLETTER Sign up Tick the boxes of the newsletters you would like to receive. Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
I consent to New Statesman Media Group collecting my details provided via this form in accordance with the Privacy Policy
SUBSCRIBED
THANK YOU