The US’s CenturyLink has blamed a faulty network management card for a crisis that knocked out internet services for users nationwide late last week – taking down 911 call centres for numerous police forces.
Yet a letter to core customers that was leaked to reporter Brian Krebs, CenturyLink – the second largest US communications provider to global enterprise customers – offered an explanation that raised new questions, not least why mission critical systems/networks were so vulnerable to a single point of failure.
CenturyLink Internet Outage: Device “Broadcasting Traffic and Consuming Capacity”
Describing the root cause as “a network management card in Denver propagating invalid frame packets across devices” the company said it had to pull the card out, remove “secondary communication channel tunnels between specific devices”, and apply a polling filter “to adjust the way the packers were received in the equipment.”
The rest of the explanation (the company has yet to make an actual public statement on the outage, despite the potentially catastrophic outcome for emergency services) left observers suggesting everything from the cascading effect of faulty hardware, to a serious malware issue in the company’s infrastructure.
In a cascading effect, the initial CenturyLink internet outage seems to have knocked out regional equipment that was unable to be reset remotely; the company had to dispatch engineers to sites in Atlanta, Chicago, Kansas City, Los Angeles and New Orleans.
“Tier IV equipment vendor support was engaged as it was determined that the issue was larger than a single site”, the company said.
“During cooperative troubleshooting between the Equipment Vendor and CenturyLink, a decision was made to isolate a device in San Antonio from the network as it seemed to be broadcasting traffic and consuming capacity. This action did alleviate impact, however investigations remained ongoing.”
As earlier reported by Computer Business Review, the outage left police forces across the US tweeting and texting residents that 911 services were not working.
The US Federal Communication Commission (FCC) has opened a public investigation into the incident, describing it as “completely unacceptable”, with FCC Chairman Ajit Pai saying that its “breadth and duration are particularly troubling.”
He added: “I’ve directed the Public Safety and Homeland Security Bureau to immediately launch an investigation into the cause and impact of this outage. This inquiry will include an examination of the effect that CenturyLink’s outage appears to have had on other providers’ 911 services.”