View all newsletters
Receive our newsletter - data, insights and analysis delivered to you
  1. Technology
  2. Data Centre
January 3, 2017updated 13 Jan 2017 11:39am

Leap second causes ‘panic’ for Cloudflare servers

People left unable to access Internet sites.

By James Nunns

The leap second that was added to the end of 2016 caught out Cloudflare causing some of its servers to fail.

The web firm which says, “we make the Internet work the way it should”, offers CDN, DNS, DDoS protection and security but found that some of its servers failed to handle the added second.

The result was that users received an error message to say that servers could not be reached instead of seeing the page that they wanted to visit.

Cloudflare said that it fixed the problem within 90 minutes and explained the problem by saying: “At midnight UTC on New Year’s Day, deep inside Cloudflare’s custom RRDNS software, a number went negative when it should always have been, at worst, zero.

“A little later this negative value caused RRDNS to panic. This panic was caught using the recover feature of the Go language. The net effect was that some DNS resolutions to some Cloudflare managed web properties failed.”

Servers were unable to handle the leap second.

Servers were unable to handle the leap second.

Cloudflare customers use the company’s DNS service to serve the authoritative answers for their domains. Basically the company is a go-between for websites that are aiming to speed up access to a site while also stopping malicious traffic.

The problem is said to have affected about 1% of the requests its servers process during the glitch.

Content from our partners
Scan and deliver
GenAI cybersecurity: "A super-human analyst, with a brain the size of a planet."
Cloud, AI, and cyber security – highlights from DTX Manchester

Analysis of the problem revealed that a mismatch between the time-stamps Cloudflare servers were expecting and the ones they got caused the system to ‘panic’.

The trigger for the issue was the addition of the leap second that was added to the end of 2016. This was added in order to compensate for a slowdown in the earth’s rotation and is designed to help co-ordinate time-keeping for those nations that use Greenwich Mean Time (GMT).

Cloudflare said: “This problem was quickly identified. The most affected machines were patched in 90 minutes and the fix was rolled out worldwide by 0645 UTC. We are sorry that our customers were affected, but we thought it was worth writing up the root cause for others to understand.”

Topics in this article : , , ,
Websites in our network
Select and enter your corporate email address Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
  • CIO
  • CTO
  • CISO
  • CSO
  • CFO
  • CDO
  • CEO
  • Architect Founder
  • MD
  • Director
  • Manager
  • Other
Visit our privacy policy for more information about our services, how Progressive Media Investments may use, process and share your personal data, including information on your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.