When it comes to databases in the 1980s, its a relational world, even in the top-end territory dominated by Big Mother. That is the popular view – until you listen to any of the devoted disciples of Ted Codd and Chris Date, whereupon the picture dissolves and everything suddenly seems a great deal more complicated. Peter White reports.
Codd and Date have become almost synonymous with relational database technology, so when someone comes along with the bravery to claim to be a co-worker, derived from the same database camp within IBM as the two gurus, if she can back it up, you listen to what she says. Shaku Atre is the latest such consultant to emerge under this banner, and backs up her credentials with some hard talking, sprinkled with quips like When IBM says SF it stands for Science Fiction, not systems facility. The two most important areas of criticism she focussed on when in London the other day related naturally to IBM’s DB2 top-end relational offering, and came from a confidence of having spent time working on what IBM has in the pipeline. She clearly understands that IBM has a long way to go, but has seen the work first hand, and has an idea of when it could be with users at the earliest.
All relational systems should have referential integrity, which means that where you have two pieces of data that have a relationship, such as one being dependent on another, then if you delete one from the system, the other should vanish automatically. For instance if a customer number has a specific customer name associated with it, then you shouldn’t be able to delete just one of them. DB2 doesn’t have referential integrity, and, she glibly points out, neither do those from any of the other vendors. Which means you are asking for trouble if you don’t monitor updates very carefully, because databases will get corrupt all the time. ‘But we’re used to that’, say all the VSAM programmers, ‘it’s no problem. It just means there’s no point having a relational database’. Referential integrity should clearly be high on IBM’s agenda, as should be the Outer Join, which is the ability to investigate all of the outer fields when two tables have one field of related data. She wants the user to have the ability to override the optimiser on DB2, which sorts out in what order any particular table is going to be searched. It’s general-purpose and sometimes the people who put the data there know better how to get at it so it should be optional, she insists. And she expects a directory which can store logical views to be added, but then again maybe that’s what DBRAD, announced last month, is: it wasn’t called that when I worked there. That indeed is what DBRAD appears to be – see report in CI No 687. DB2 users will also be wanting an on-line performance monitor – at present there is only a batch monitor. All of these facilities could be with the DB2 user soon, if IBM decides that its software is mature enough and the time is right. Half of these could be with us in one year, the rest in two. The shock suggestion she made was that R*, IBM’s distributed relational database, could also be out inside two years. That suggests that IBM is putting a lot more behind the project that it has previously. Companies are now rushing to commit to introducing distributed databases, but until the past year ago the problems created by scattering bits of a database all over the place on machines sitting in widely dispersed locations were regarded as well-nigh insuperable, so much so that the game was considered not to be worth the candle. Distributed relational databases are still acknowledged to be extremely tricky and Ms Atre carries a list of 30 key features that none of the products on the market have, all of which they need before the products can be said to be really helpful. A couple of key points are critical. When one part of a distributed relational database sends data to another part, it should first ask if the other node is ready to receive it, then send it, then ask if it arrived safely. This is called a two ph
ased commit, and she says no product she has seen has this. Again, without it, database corruption is just around the corner, especially if you’re sending stuff over noisy dial-up British Telecom lines. When a single large file is split into two or more parts – horizontal fragmentation – those parts should be maintainable at different locations. And there should be a slave copy of this data at other locations, to ensure multi-site recovery. If a whole node goes down, recovery should be automatic without loss of ability to access any data, so the slave becomes the temporary master.
All the products at the moment have only single site update, multiple site enquiry. But the clincher that says that distributed relational databases haven’t really arrived yet, was revealed, she says, when she asked 18 suppliers about an intelligent front end algorithm to decide the shortest way of asking the questions, depending on where the data is, and how much of it needs to be sent around the network. She has had no replies that suggest any vendor has a product with this facility yet. Ms Atre was over here both to promote both her services as a guru, and to help Cincom Ltd with its Supra product range. Every time she mentioned that no product had a particular facility, a nervous Cincom employee insisted, except us. A claim which does seem to be backed up by a recent test carried out by Computerworld, measuring the Cincom product against the Codd reference model: in the tests for conformance, Supra scored 64% – and it could perform even better if it takes Ms Atre’s advice: she is recommending that the company should investigate expert systems with a view to adding a plain language front-end to Supra to make it more accessible to lay users – effectively adding the expertise and guidance of a database specialist. And although IBM’s relational pot seems to be on the boil, Ms Atre adds I can’t ever see DB2 going anything like as fast as IMS with Fastpath, but only 5% of the market needs anything that fast and IMS will improve to take Transaction Processing Facility’s position at the top end. Anything down below 30 to 40 transactions a second will become the province of DB2.