View all newsletters
Receive our newsletter - data, insights and analysis delivered to you
  1. Technology
  2. Networks
August 24, 2015

How can IT cope better with outages?

8 tips to rapidly restore services to users.

By CBR Staff Writer

For today’s "always-connected" businesses, something as minute as an operational glitch can bring a company to its knees. The way that IT communicates in the first few moments of the service outage is crucial, no matter how small the problem.

However, manual processes, diversified IT infrastructures and dispersed workforces can complicate these communications, increasing downtime and impacting the business.

We have outlined eight tips for automating the IT incident management communications process to ensure IT incidents are resolved quickly, smoothly and effectively:

1. Identify

The hallmark of a major incident is service disruption. In most cases, an operations manager identifies a major incident and escalates to a major incident manager, but some companies automate the routing of major incidents. Establish resolution processes for less critical issues as well, so they don’t go to major incident managers unnecessarily.

2. Engage

A communication system should pre-populate major incident managers with their contact information and on-call schedules, enabling the operations manager to instantly locate them and target notifications to them. Automating this initial engagement can have huge benefits, reducing a lengthy (20 minutes) process by up to 90 percent.

3. Triage

The manager who accepts the case determines whether the alert is a false alarm and what the incident actually is. The 2014 SANS survey (sponsored by AlienVault) revealed that 15 percent of organisations have issues with false positives.

4. Find and Assemble

Once the major incident manager understands the nature of the incident, he assembles the appropriate resolution team members based on the skills required.

Content from our partners
Why food manufacturers must pursue greater visibility and agility
How to define an empowered chief data officer
Financial management can be onerous for CFOs, but new tech is helping lighten the load

Common roles include a service desk manager, service-level agreement (SLA) manager, change manager, software developer, quality assurance, operations engineer, infrastructure engineer and problem manager. When an IT outage occurs and a notification is received, these members drop everything else to work to resolve the issue.

Some companies choose to have a SWAT team at the ready, instead of making sure someone from each required department can free himself to help.

5. Collaborate effectively

Simply assembling people and initiating the conference call can take 45 minutes. Companies that use mass communications during this phase often get way too many people on the conference bridge. Each new person that joins interrupts the flow of the call. Repeating the background information for each person can waste an additional 10 minutes.

With a leading communication platform, the major incident manager can customise the message so resolution team members understand the basics before the call, and can join the bridge with just a button push instead of having to dial in. These messages can often be technical in nature, using IT jargon and specific server names.

Proactively and intelligently informing those affected with a more business-friendly message enables Marketing, PR, and executives to communicate responsibly, effectively and consistently.

6. Resolve

Members of the major incident team use all available communication channels integrated with their communication platform, including chat, text, email, phone, Skype, Slack, and more to identify and resolve the underlying cause of the issue. The communications also enable the major incident manager to keep stakeholders up to date and let her team members resolve.

7. Restore

Once the underlying issue is resolved, the team members can restore service and end the incident.

8. Review (Post Mortem)

A review is a fundamental piece of the incident resolution process, and all relevant parties should attend. The incident should have been documented and recorded, so the major incident manager and the problem manager should walk the group through the incident record, so they can assess the resolution process together.

The review can also identify improvements that can prevent a similar incident from occurring again.

If an enterprise’s data, information and processes become compromised, the business can suffer irreparable damage. So when major incidents occur, how the communication is managed is everything.

These tips will help enterprises implement the automated communications processes needed to get the business up, running and restored quickly and easily in the event of an IT outage of any size.


By Teon Rosandic, VP EMEA, xMatters


Websites in our network
Select and enter your corporate email address Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
  • CIO
  • CTO
  • CISO
  • CSO
  • CFO
  • CDO
  • CEO
  • Architect Founder
  • MD
  • Director
  • Manager
  • Other
Visit our privacy Policy for more information about our services, how New Statesman Media Group may use, process and share your personal data, including information on your rights in respect of your personal data and how you can unsubscribe from future marketing communications.