“The era of self-healing technology is here,” said Alan Ganek, chief technology officer, IBM Tivoli Software and vice president, IBM Autonomic Computing – a man with one of the longest titles in the industry.
“These new products from IBM allow companies to spot and fix IT problems automatically – and behind the scenes,” said Ganek, “so they can focus on strategic projects that are valuable to their business. We’re opening new doors to reducing the complexity of technology.”
It’s not the first time that IBM has ‘bigged-up’ the idea of self-healing computing, indeed its whole Autonomic Computing initiative speaks to that goal. But this time there does seem to be more than just ‘slideware’ behind the announcement.
Tivoli System Automation is the umbrella term for the products, which include Tivoli Monitoring, Tivoli Composite Application Manager and Tivoli System Automation for Multiplatforms. Tivoli Monitoring is said to allow companies to manage online applications, such as email or bill paying systems, by proactively correcting IT service problems like ‘hung’ applications, and fixing the problem across a company’s servers, operating systems and databases before it impacts users.
Tivoli Composite Application Manager speeds up access to information on the Net, IBM said, by predicting and fixing bottlenecks that crop up as dozens of different systems connect under a standards-based Service Oriented Architecture (SOA). The self-healing software can locate where problems lie, identify the specific cause, and take steps to solve the problem before customers are affected.
Finally the Tivoli System Automation for Multiplatforms is claimed to be able to pinpoint the status of complex applications running on multiple platforms and operating systems, and then use policies to automatically bring them back online if the system fails because of a power outage or other cause.
While the products are all steps in the right direction, I wouldn’t get too excited just yet. What these products are really enabling system administrators to do is set “if, then” contingency plans up. For instance, “If server A runs out of capacity, then fail over to server B”.
This kind of ‘self-healing’ is only as good as the policies set by the administrator, and can only help if the infrastructure is designed in such a way that something can actively be done about the problem without user intervention. For instance, you can’t fail over to another server if you don’t have spare capacity. You can’t automatically reboot unless you can afford a little downtime. And while IBM’s self-healing software is said to automatically bring the entire application and database environment back online if an outage occurs, that is unlikely if the infrastructure involves a lot of third party applications and systems.
So while this latest attempt at self-healing is certainly a step in the right direction, users should be aware that in order for systems to heal themselves when something goes wrong, they are going to have to spend a little more time designing their infrastructure with resilience in mind in the first place. Which is, come to think of it, No Bad Thing. But it’s not necessarily cheap, either, though you might be talking about lowering ongoing management burdens and costs while increasing up-front infrastructure set-up times and costs. Again this is not necessarily a bad thing, but it does add a rather large caveat to the idea that self-healing systems are “here”.