Have you ever stared at a huge pile of magazines or newspapers knowing that in one of them there is a specific article that you really want to read, but you just can’t remember which issue it’s in? Well, Oracle UK Ltd has been talking about a product that is based on the idea of saving the search time spent looking for that article. The Bracknell, Berkshire-based firm has been showing off its parent’s new data management software named ConText (CI No 2,424), which can analyse the content and meaning of text stored on a computer and is designed to cut the time wasted catching, storing and retrieving information. So it can, for example, sift through all that electronic mail that’s piled up and index and summarise it so you can more easily decide which messages you want to read.
Thesaurus
The core of the product is the lexicon that contains 600,000 words. To each of these words 1,000 potential linguistic attributes are added to help resolve the problem of words having different meanings depending on their context, for example. The software firstly determines meaning by the content of other words in the sentence but if that is not clear then it takes the whole paragraph, or failing that the whole document. Many phrases and clauses are also pre-programmed in or can be determined by the system. When faced with a piece of text, the software firstly finds information in it that provides the background to the document, then it finds details of any opinions held by someone in the story and who said it and finally it will log the main theme or opinion that the article carries. It can also determine the subject of an article to be a word that itself is not actually mentioned in the text by virtue of a thesaurus that helps identify themes in the document. So the overall document is analysed on grammar rather than simply on word frequency. The drawback seems to be that Oracle says its a matter of building up trust with the sys-em as to whether it summarises the document according to what you feel are the key issues in that document. To help avoid this there is an index of each article which summarises the less central themes that have been written about as well. Oracle says that a major US information provider is currently evaluating the product and building an application, and Oracle is also working with UK customers and will announce developed applications over the next few months. Oracle ConText itself is available now but only runs under Unix at present. Pricing will be announced over the next few months and Oracle says that it will compare with text retrieval products, and will be on a per-user and a per-development basis.