In a recent interview with ComputerWire, Luke Lonergan, chief technology officer at San Mateo-based Greenplum, detailed a laundry list of improvements the company is planning for PostgreSQL. These include: greater upwards scalability through range partitioning, more comprehensive support for SQL 99 queries (ie. analytic queries, rank), bit-map indexing for quicker OLAP queries, materialized views, a faster data loader with complete error handling capabilities, and better manageability features.

Mr Lonergan acknowledged that PostgreSQL is far from being a fully-fledged data warehousing and BI database platform right now and that other popular open source databases like MySQL simply run out of gas when deployed in business intelligence environments.

There are a bunch of things that are missing from open source databases for data warehousing. We’ve looked at what’s missing [in PostgreSQL] and are now putting it on the agenda of the open source development community.

Really what Greenplum is aspiring to do is what Red Hat is doing with the Linux operating system; that is to sponsor and foster an open source development community (in this case called Bizgres) to drive forward the development of PostgreSQL as a fully-fledged data warehousing platform. The goal of the Bizgres project is to work with the PostgreSQL community to build a complete, database system for business intelligence exclusively from free software.

The Bizgres community has been very welcoming since there has been no real open source play for business intelligence and data warehousing, Mr Lonergan said. He pointed out that PostgreSQL has been largely deployed in transactional processing environments. The current version of PostgreSQL – 8.0 released last January – has a comfort zone up to half a terabyte only and has lacked high-end features like table partitioning.

The likely result of Greenplum’s (and Bizgres’) efforts will be split versions of PostgreSQL; one for OLTP and the other for data warehousing. Greenplum was formed in 2003 through the merger of Metapa and Didera, two clustering database technology vendors. The company is now trying to build an open source data warehousing platform.

Greenplum is still very much in start-up mode and offers two freely downloadable open source databases in production right now – DeepGreen and DeepGreen PostgreSQL – which are the supported versions of Bizgres. Both are aimed at small to medium sized companies with entry-level and departmental data marts around the 10 to 500 gigabyte range.

The company’s strategy is simple: seed the market with DeepGreen, skim off nominal service revenues, and then (hopefully) migrate customers to its fully paid and more scalable DeepGreen MPP (massively parallel processing) version which targets environments in excess of half a terabyte (500 gigabytes). The MPP version is likely to be based on Metapa’s cluster database technology and is expected later this summer.

We like to think of it as try before you buy, said Mr Lonergan, explaining the company’s open source strategy as a way of getting PostgreSQL a foot in the door of companies. We’re giving customers the software up front to put together terabyte-class warehouses into production.

If customers like what they see and want to advance to beyond a terabyte on multiple servers they can graduate to our commercial MPP product which we will charge for.

Greenplum is not the only data warehousing vendor targeting open source. Appliance vendors like Datallegro and Netezza both use open source databases as part of their solutions. Greenplum however differentiates itself by decoupling the software and hardware components.

Other vendors offer hardware-software combo as a turnkey black-box. The problem is that it can be expensive and hard to upgrade, Mr Lonergan said, adding that Moore’s law (concerning price-performance) and low-cost commodity hardware point in favor of Greenplum’s software-only approach. Prices are dropping on the hardware side, while Netezza’s appliance is not.

Mr Lonergan said the company is initially focused on capturing a [data warehousing] audience that would be caught by Microsoft SQL Server and Oracle, which he estimates is a $24 billion market right now.

Mr Lonergan also said that given Greenplum’s economic price-point, the sweet-spot is anything between two to 60/70 terabytes. In truth however, Greenplum is really looking at the lower-end of this scale. Of course how successful Greenplum will be in attacking the lower end of the high-end market depends on how big and motivated the existing community is for developing an open source data warehousing platform.