Last August, technical issues at Delta Airlines forced it to cancel over 2,300 flights. The delays were so expensive the carrier downgraded its profit guidance for the third quarter – an over $100m revenue hit. Cause? Several hundred servers had not been able to connect to Delta’s backup system during an outage, a failure of business continuity after a primary system outage.
A month later, thousands of British Airways passengers across the US suffered hours of delays, some lasting overnight. Harried ticketing staff were reduced to delivering handwritten boarding passes due to an outage of the carrier’s expensive new check-in system.
Delta and BA’s woes were far from unique, all major airlines have experienced expensive and widely publicised IT glitches in recent years. Structural flaws in the source code of computer systems are the most frequent culprits of operational incidents, which from my own experience as a frequent flyer occur far more often than reach the press.
Welcome to the era of 9-digit defects. When losses from IT malfunctions hit 5 or 6 digits (a mere $100,000 or so), IT managers are at risk; when it hits 7 or 8 digits, IT and line-of-business executives take the heat. When losses hit 9 digits, as they did at Delta, the C-suite take the calls.
Lack of adequate internal quality support technology?
In the airlines’ defence, few industries require such intricate and intertwined logistics on a global scale to conduct basic business operations. But why aren’t these fleets of modern aircraft, controlled by the most sophisticated avionics software ever written, matched by IT systems of equal dependability?
There is no one answer, but here are several factors that contribute to the problem. First, getting planes into the air requires seamless interaction among multiple interconnected business systems, including reservations, check-in, baggage handling, cargo, no-fly checks, fuel projection, flight planning, and more. One failed interaction can cause an incident that cascade into a sequence of failures potentially having a global impact.
Second, these systems have grown with the industry, being built and upgraded over decades, forcing airlines to integrate different technologies with different architectures. In many cases, these systems were not designed to carry the loads and connectivity they support today. Complicating the situation, various systems were often developed by different companies using different coding and documentation standards, working at different levels of discipline and professionalism.
Third, when airlines consolidate they must merge IT systems that may have supported different operating methods and represented data in different formats. If rushed the mashed-up software will reflect the shortcuts and mistakes made in merging the airlines. While software suffers from the unique challenges of its medium, it can also suffer from unnecessary complexity in the processes it must automate.
Fourth, no single IT professional or team can understand all the interactions among these systems written in different programming languages, hosted on different platforms. Too many developers lack support from the latest generation of sophisticated analysis tools designed to augment their capabilities. Some airlines have underinvested in the staffing, training and infrastructure required to ensure dependable back office operations.
Getting back in the flight path
Airline IT leaders m have evidence-based options available for improvement.
- Start with a rigorous dependability assurance programme – Evaluate all operationally critical systems for correctness and engineering soundness. Evaluating these IT applications at the system level is critical; many incidents are caused by flawed interactions between different parts of systems that can only be spotted by evaluating the system from user entry points, through its processing, its querying of the database, its possible interaction with other systems, and its response back to the user (which might be another system). Rigorous dependability assurance is not cheap, but the ROI you can achieve is much cheaper than the cost of nine-digit defects.
- Do it properly the first time – Software-intensive systems must be built in a disciplined environment that provides the time and resources to do professional work. Start by normalising the business process the system will automate. Plan realistic schedules because rushed systems always cost more from the staggering effort to correct mistakes, not to mention the damages resulting from serious incidents. High quality software takes less time to develop, is much cheaper to maintain, and is more speedily enhanced at the pace of business.
- Hold suppliers accountable – System suppliers (outsourcers, system integrators, and software vendors) should be evaluated before contracting to ensure their development practices are rigorous and they can retain key staff for the duration of the project. Among other agreements, the contract should include measurable targets for software attributes such as reliability, security, and changeability, all of which can be measured by detecting violations of good architectural and coding practices in the source code. Delivered systems must be thoroughly evaluated by functional, structural, performance, penetration and system testing at a minimum before entering operations.
The IT systems supporting today’s airline sector reflect the growing complexity of airline operations. No system as complex as airlines operations can be made risk-free. However, the risks complexity can be reduced through IT modernisation. When airlines ground their passengers from faulty IT, the twitter-verse explodes, along with the profits.