Organisations everywhere are continuing to digitally transform as they strive to deliver seamless customer experiences and innovate faster to meet user expectations, writes Michael Allen VP & CTO EMEA, Dynatrace.
To achieve this, they are increasingly migrating more services to hybrid, cloud-native environments. While these dynamic ecosystems bring a remarkable level of agility to organisations, they also introduce unprecedented levels of complexity, which recent research has shown is growing beyond human abilities to manage
Modern IT teams are bombarded with thousands of performance and availability alerts every day, which they are required to investigate to identify and resolve potential problems before they impact the performance of IT services and reduce user and customer satisfaction. Faced with such a high-volume of alerts, the average IT team spends 15% of its time just trying to identify which alerts need to be focused on. This costs organisations an average of $1.5 million in staff overhead each year – and that’s before they’ve even gotten onto the task of resolving the underlying issue.
The Increasingly Cloudy Future
Much of the challenge that modern IT teams face is rooted in the fact that the applications running in today’s enterprise cloud ecosystems are hugely complex, with hundreds of technologies, millions of lines of code and billions of dependencies behind them. All this is producing a volume, velocity and variety of monitoring data and performance alerts on a scale that has never been seen before. Traditional application monitoring methods are ill-equipped to make sense of all this data and provide the level of observability that IT teams need to manage service performance effectively.
In large part, this challenge stems from the fact that traditional monitoring systems typically operate in isolation from one another. As a result, they are collectively sending out thousands of alerts that lack the wider context of what is taking place across the full stack. The data that IT teams receive is therefore undifferentiated, with large numbers of false positives and duplicate alerts that need to be sifted through before they can get on with the work of resolving problems. Faced with this constant barrage of data and unable to immediately focus on genuine performance problems, IT teams are spending more and more time on basic triaging to determine where they should be directing their efforts. This task is made even more cumbersome by the fact that most of the alerts are irrelevant and low-level, with CIOs saying that on average, only 26% require actioning.
Drowning in Alert Storms
Sorting the false positives, duplicates and low-priority alerts from the genuine problems is a slow and error-prone process. This means IT teams have less time for the significantly more important task of identifying the precise root-cause of performance issues and remediating them before customers or end-users experience disruptions in service. In today’s age of the customer, where we have ample choice and opportunity to switch to an alternative service at the drop of a hat, this can lead to a loss of revenue and hurt the bottom line for organisations. Users expect a seamless digital experience and in order to deliver this, IT teams must be able to maintain end-to-end observability. Only then can they effectively manage their increasingly complex IT environments, with the ability to identify and resolve performance issues before the service quality is impacted.
Clearly the status quo is unsustainable, and a radical change is needed to ease the strain on IT teams. Critical resources that teams are currently wasting on sorting through thousands of performance alerts need to be redirected toward effective performance management and driving seamless digital experiences. Some organisations are attempting to deal with the problem by incrementally updating their performance monitoring tools. This brings very limited success, because the tools they are updating were never created for the dynamic nature of multi-cloud environments. Taming the complexity of these cloud ecosystems requires a transformative change; one that goes beyond relying on human capabilities alone.
Weathering the Storm with AI-driven Operations
Organisations need to transition to AI-driven cloud operations in order to master their complex environments and remain successful in an experience-centric world. Combining this with a common data model-based-approach that breaks down the silos between monitoring data will offer far better support for IT teams, providing them with fully contextualised, precise answers to performance problems, rather than more data and alerts. This will pave the way towards self-healing applications and auto-remediation through the automation of continuous delivery and operational processes.
Ultimately, IT and business leaders must address the inadequacy of traditional monitoring systems that are drowning IT departments in relentless alerts. Today’s organisations need to make a decisive shift towards AI-powered cloud operations that provides actionable insights into the performance of their applications and the impact on the end-user. Only then will they be able to deliver seamless digital experiences amidst the complexity of the enterprise cloud and remain competitive in a customer-centric world.