Fault resilience is one of the growing number of weasel phrases that bedevil the computer industry, and Dennis, Massachusetts-based researcher Standish Group International Inc reckons it has the evidence to back up claims that if you need 24 hour a day, seven day a week operation, nothing less than a fault-tolerant system will do – and it will neither cost you more nor be less open than a fault-resilient configuration of a machine of comparable performance. The findings, it says, come from a survey it conducted with over 1,200 US information technology users and executives. Our findings over the past several years indicated a lot of confusion in the marketplace regarding just how expensive and reliable failover configurations were versus fault-tolerant systems, comments Standish Group chairman Jim Johnson. This study clearly shows that confusion still exists. But it also found that fault-tolerant system vendors offer higher system availability and industry standard operating systems at comparable price-performance to failover vendors for the ownership life span of a system. The CHARTS Comprehensive High Availability Requirements Technology Study says that while many data processing managers interviewed said they thought fail-over systems are less expensive than fault-tolerant systems, the Standish Group has found no concrete evidence that fault-resilient systems have this kind of price advantage.

Much too complex

In fact, we found to the contrary. Looking at published TPC-C benchmark numbers, the researchers conclude that the addition of extra hardware, fail-over software, and the services needed for designing and implementing fail-over systems could drive the price of that fault-resilient system well over the price of fault-tolerant system. Both fault-resilient vendors and their service partners feel their technology is much too complex for the typical user organisation, it notes, pointing out that crucially, unlike fault-tolerant systems, fault-resilient systems are not turnkey operations. To make the most of a fault-resilient system, a user must become proficient in high availability technology, including the generation of failover scripts which must be written, tested, and debugged. Generating failover scripts includes determining the correct failure scenarios, script writing, application modification, and rewriting and retesting scripts as applications change and hardware is upgraded and expanded. Changes in the application, new software, new hardware, new releases of software or hardware, and many other events require the rewriting and retesting of existing scripts, the study points out. As for open systems, the study found that users’ perception that fault-tolerant machines were proprietary is no more true than for other systems that have made the transition from a proprietary base to open systems. For example Stratus Computer Inc FTX fault-tolerant Unix systems receive 100% of the points on our open systems score card, it says, pointing out that the failover scripts required in fault-resilient systems are proprietary and cannot be migrated across open systems, and if they are written into the application, the application itself is not portable. The report sounds as if it was commissioned by a vendor, likely Stratus, the only one mentioned, especially since the terminology used echoes what Stratus is saying about its forthcoming Continuum Precision Architecture RISC machines (see front), but the arguments sound persuasive; no word on the report’s availablity.