By Joanne Wallen
Rudyard Kipling once said, If you can keep your head when all around are losing theirs. With massively parallel processing vendors all around losing their heads, UK company WhiteCross Systems Ltd seems to be keeping its own head above water by focusing on a well defined niche market, data warehousing. The company has recently received a fresh injection of venture capital, won three deals for 100-processor computers worth more than $1m each, and now has an installed base of 30 (CI No 3,083). So what is it that makes it believe it is here to stay, and that it won’t go the way of Thinking Machines Corp or Kendall Square Research Corp, nor just become any old data miner? Chief executive Chris Barfield says it’s because WhiteCross has something different to offer. That difference lies both in the technology itself, in the company’s strategy and in its approach to its customers. On the strategy front, Barfield believes far too much money has been spent by too many companies on the technicalities of building a data warehouse in the first place. Barfield says WhiteCross talks to marketing managers, and business leaders about the sort of problems they are experiencing in exploring their data and getting useful answers from it. The company has found that once users see the answer to one problem, they want to ask more and more, and the system begins to speak for itself. In fact, so confident is WhiteCross that this approach pays off, that at the end of last year it launched The WhiteCross Challenge. The challenge encourages companies to present themselves and their data to WhiteCross with a specific business problem to solve. The company will sit down and agree the precise problem to be solved, cost it, and says it will go forward with the project only if both parties agree the exercise will pay for itself.
Refund all fees
If the WhiteCross system does not produce the agreed benefit, the company commits to refund all fees in full. Bold claims indeed, but WhiteCross has seen the reaction of both technology managers and business managers to the speed with which it delivers answers to previously unaskable queries, and it is confident it will not fail to deliver as promised. This realization that data warehousing and data mining systems are tools for business managers is not new, and not unique to WhiteCross. More and more of the data warehousing brigade have introduced a variation on a theme to lure the marketing director into the fold. However it is on the technology side that WhiteCross claims to really come in to its own. Where its offer differs, says Barfield, is in the fact that the company first decided on its business focus, data exploration and decision support, and then built the system it needed to service that business. It knew it wanted to use standard SQL as the query language to extract data from customer databases, and it decided that a Massively Parallel system would be the best for the job. Barfield says unlike most of the other Massively Parallel Processing vendors’ systems, WhiteCross systems were never designed as scientific number crunchers, so it can claim the systems were built from the bottom up for decision support, based on SQL, and the greater part of the systems is actually the software. The systems are predicated upon the idea that with the advent of open database protocols and the demand for vast volumes of business data to be collected and analyzed, the time has come for what the company calls the data appliance, a system designed solely for the capture, storage, maintenance and analysis of very large databases. WhiteCross director of strategy Dan Holle says most data warehousing systems are based on hardware and software that was designed for online transaction processing applications, and based on outdated power/memory equations. Holle says massively parallel processing is the natural architecture for large scale data exploration. Tasks lend themselves to being divided up across many processors, so that the time per task can be reduced in proportion to the number of processors. Where WhiteCross believes it differs even from the likes of Teradata Corp, is that it is newer to market and has not been handicapped by architecture based on the original IBM PC. The WhiteCross system incorporates a grid of interconnecting single processor nodes and associated memory. Each node communicates with four adjacent nodes so that processing power is not constrained by a single data path. Nodes can be added in an almost totally linear fashion, where performance increases in line with the number of nodes added. When an enquiry is made, an on-board compiler identifies the tasks that need to be performed and divides them across different nodes. The minimum configuration of a WhiteCross system is twelve nodes, and the company says there is no theoretical upper limit. As far as software is concerned, WhiteCross eschews the paranoia about proprietary software. The company has written its own relational database which runs only on its own hardware. Its argument is that commercially dominant databases have grown from a transaction processing environment and are optimized for this role.
In addition, they are built to be portable, and run on as many hardware platforms as possible, which, WhiteCross says, brings with it an inherent degradation of performance. WhiteCross applications are compiled directly to machine code increasing the speed at which the machine performs tasks by two orders of magnitude. The company says a typical database search using a portable database goes through six layers of software architecture, with each layer involving between one and a thousand separate instructions, resulting typically in at least 10,000 instructions being issued. In contrast, it says, its own proprietary software needs only about 30 instructions to complete the same task, which it says at similar processor speeds gives an actual speed advantage of 300 fold to WhiteCross. Where the WhiteCross approach really comes in to its own is in its ability to let users truly follow what it calls ‘train of thought’ querying. Hitting any kind of processor with totally unstructured types of query is usually a nightmare, and what many database vendors do to minimize the performance hit is to use indexing to define how data tables are stored. Indexes are built to reference those columns most likely to be frequently searched. However, this can be very limiting to the end user. Indexing can also create additional overhead. WhiteCross instead uses a single image index, which is to index all table data in a single bit- mapped and compressed file. Any query refers to this single, master index to find where to look for the data. Holle says this single image index is not only efficient for data exploration, but it is also optimized for data loading, since changes to the master index are made on the fly and held in RAM, avoiding the need for disk access. So the answer from WhiteCross is not just to throw raw processing power at the problem, but to understand the user requirement to use standard SQL queries, in an unstructured way, and optimize hardware and software design for this task alone.