When things go wrong, whether a natural disaster or political upheaval, the telecoms network is probably the least of many people’s worries – assuming it works.
But look behind the scenes and there is a huge team of engineers working round the clock to keep the network up and running even through the direst of emergencies.
AT&T recently hosted a tour of its network disaster recovery (NDR) centre, the place where these kinds of operations are based. The location is secret, with the site shared between AT&T and a local company; none of AT&T’s branding demarks the outside of the complex.
The main reason for the secrecy, according to Justin Williams AT&T’s International Director for Disaster Recovery, is simply that it is easier to operate without interruption or theft.
“It’s much easier for us to just whir away in the background and not attract too much attention,” he says.
Operating from this inauspicious site, how does AT&T manage to repair and maintain networks thousands of miles away?
First things first: the investment. AT&T’s NDR arm was founded in 1992 and got its first NDR trailer the following year.
Since then, the US-based telco has pumped $600 million of investment into NDR, with the fleet of vehicles now numbering 330.
This includes several categories of vehicle, including power generators and distributors and hardware and machine shops. When not being used for NDR, the site is used as a beta testing environment.
The teams have conducted 75 field exercises, with over 140,000 working hours spent on field exercises.
“We practise on pavement, not on paper,” explains Williams.
It was the first company nationwide to receive the US Department of Homeland Security’s “Private Sector Preparedness Program” certification.
So what kind of event is the NDR team looking out for? Williams says that there are all kinds of events that are “attempting to trip up the network on a daily basis.” Basically, anything that damages performance for any reason could be a job for the NDR team.
Some are expected and can be planned for in advance, while some will be by their very nature a surprise.
The team is at the back end of planning for the Rio Olympics, for example. The team assesses a range of risks. There could be a huge increase in traffic across the network. A large movement of people could mean more potential vandals are exposed to the network equipment, posing a physical threat.
The events themselves can also present obstacles. For the London Olympics in 2012, they closed roads for the marathon events. This could have meant that AT&T couldn’t have reached their site to carry out repairs, necessitating either some sort of specific arrangement with authorities or having some employees staying on-site overnight.
The recent EU Referendum vote was another event that the team had to plan for in advance. Williams says that the vote had been assigned roughly the same level of risk as a UK General Election, with most of the risk coming from movement of people.
Alongside this long-term planning, the team is constantly monitoring the network. There are several layers of monitoring, with the aim to detect problems before the customer notices them. Normally the problems can be handled by engineers, with four or five problems per year requiring an intervention from the NDR team.
AT&T’s people keep up with the news, and AT&T has its own meteorological society monitoring weather events.
For major international events such as the Olympics, the Global Network Operations Center (GNOC) will monitor the network during the event. This centre looks at network vulnerabilities in a particular area.
When the call to NDR does come through, it is time to get a team together and get the equipment to the site as quickly as possible.
The NDR team has more than 30 permanent members in the US and UK. Added to this are the over 100 volunteer members in the United States and abroad who work full-time in other roles at AT&T.
From this pool of personnel, depending on availability and location, a team is pulled together. This team will generally have a range of expertise of varying depth, with some team members capable of training others.
The UK site is the global hub. Equipment can be flown from there to wherever it is needed. AT&T has arrangements to fast-track the Visa process and pays for passports and inoculations for team members.
Once on-site, the key is self-sufficiency. AT&T’s team aims to place little to no demand on the local infrastructure. AT&T’s own generator is used until the team can tap into the local power supply.
The trucks carry “everything you need in a network”, Williams says, including fibre connections, wi-fi and packet switching routers.
The network equipment is set up at a “sweet spot”, close enough to the site of the disaster but not too far.
The longest deployment the team has ever done was in southern Europe, Williams says, lasting eight months. For him, the biggest event he has been involved in was the recovery efforts after the 2010 earthquake in Chile.
Disaster recovery is in many ways unlike any other investment. On the one hand, for future disasters it is of course a question of when, not if. Yet, despite this certainty about its future value, it is an asset that limits damage rather than recoups investment.
“You don’t win a (customer) bid on business continuity and disaster recovery,” says Williams, “but you enhance it.
“When something happens, it shows the value. If you didn’t have the investment, what options would you have?”
In terms of future investment, Williams says there is no fixed plan in place. The investment will be determined by the technological requirements of the time. For example, the emerging technologies of software-defined networking and network functions virtualisation will affect investment. A major event could also lead to more capital being invested.
But for the time being, AT&T’s NDR team will simply be “whirring away in the background” as it has been for the last 24 years.