The IT giant is promoting the Unstructured Information Management (UIMA) architecture as a standard integration framework for third-party text analytics and business intelligence tools and applications to plug into. In parallel, IBM is also releasing an UIMA-compliant version of its WebSphere Information Integrator OmniFind Edition enterprise search platform.

UIMA defines a common set of interfaces for integrating different text analytic components and applications — either in batch or real-time. It includes an SDK for building custom and reusable UIMA-compliant text analytics modules (called annotators) that extract and analyze unstructured information.

Because UIMA defines a common data model for annotator interoperability, third-party ISVs and systems integrators can also develop their own annotators to play with the framework.

The UIMA standard is being developed by a working group comprised of government, industrial research and leading academic institutions. The UIMA SDK is currently available on IBM’s AlphaWorks emerging technology portal for no charge.

IBM’s rationale for driving UIMA is twofold: to allow point text analytic tools to easily interoperate and to drive text analytics into a wider range of enterprise search and business intelligence applications.

The problem with today’s text analytic offerings is that one solution doesn’t fit all, said Marc Andrews, director of strategy and business development for IBM’s Information Integration Solutions group. Since vendors offer multiple point solutions that cannot be easily integrated, its hard for a single [vendor] to address everything.

IBM recognizes that different technologies – linguistics, tokenization, categorization, entity extraction, reporting and analysis – are part of the overall puzzle and lays out clear lines of technology demarcation in its UIMA framework. UIMA recognizes that different vendors do different things well. It’s really a best-of-breed approach

Andrews said the use case around these various technologies has been very specific up to now. Typically text analytic solutions have been applied for a limited purpose and a small set of users, for example, things like warranty analysis, where domain specificity is important.

With UIMA we’re looking to integrate text analytics in a more mainstream fashion…into broader search and business intelligence applications.

In a parallel announcement, IBM unveiled WebSphere II OmniFind 8.2.2, which IBM bills as a next generation integrated text analysis and enterprise search tool. The product is included as a deliverable of IBM’s ongoing Serrano release.

IBM officials say that OmniFind incorporates concept and fast searching capabilities that surpass basic keyword search products. The product is built around a runtime processing engine that creates and manages multiple text analytic processes driven by various annotators. OmniFind adds advanced search capabilities directly against the results of the text analysis.

IBM is keen to position OmniFind as a commercial platform for deploying UIMA-based text analytics applications inside an enterprise search tool, rather than push its own brand of enterprise search. Sure we’ll be competing in the search market with the likes of FAST and Autonomy. But we believe we have a first to market advantage with a ready to go UIMA-compliant product that for easy deployment of integrated text analytics, business intelligence and search across a much broader range of applications.

Of course the success of UIMA, as with any standard, depends on widespread adoption — something that IBM acknowledges is key. Partners need to be motivated to make their products UIMA-compliant, Andrews said, pointing out that UIMA is being warmly received by the ISV community at large.

Around 15 influential vendors have so far endorsed the standard and plan to use it as a standard way of integrating text analytics into their applications. These ISVs are split into three main categories: content delivery (Factiva and Ql2 Software), text analytics (ClearForest, SAS Institute, SPSS, iPhrase, Attensity, and nStein), and applications that leverage text analysis and search (Cognos, Kana, Siebel).

IBM also intends to make UIMA available to the open source community (Sourceforge.com) by the end of this year. We’ll be submitting UIMA to SourceForge by the end of this year, Andrews said.

IBM also plans to roll out a number of function-specific applications built using combinations of UIMA-compliant technologies. Applications that have already been developed, or are in the works, include: early warning, customer support and self service, market and competitive intelligence and fraud and risk analysis. These applications will be delivered as solutions via partners, Andrews said.

Partners believe these three initial applications are only the tip of the iceberg. There are over 40 realistic applications so there’s lots of opportunity, said Randy Clark, vice president of Waltham, Massachusetts-based ClearForest Corp, one of the first text analytics vendors to develop UIMA annotators.

Clark said that over time that UIMA will replace the platform layer [ClearForest Tags] of the company’s text analytics stack.

Partners believe UIMA stands a good chance of success, especially since it has a powerful and influential vendor like IBM behind it. From our standpoint UIMA important milestone for what we call ‘unified BI’…where lines between business intelligence and search are becoming more blurred said Clark.

To date business intelligence has matured almost entirely around structured [numerical] data. UIMA opens up an opportunity to add unstructured data into the mix.

He added: Of course we would have partnered with other vendors without UIMA. But UIMA makes it that much easier to integrate.