Open-source software company DataStax is developing a package that will combine its Cassandra non-relational database with Apache Hadoop data process framework.
Usually, Web 2.0 companies that need to store simple data sets quickly use Cassandra, while Hadoop is used for analysis of enormous data across many servers.
The package will unite what is considered as conflicting values of fast data access and in depth analysis of the data. It is believed that analyzing live databases slows them down.
DataStax CEO and co-founder Matt Pfeil said the new distribution, to be called Brisk, merges low-latency data storage and retrieval with the ability to do in-depth analysis of that data.
DataStax is making use of Cassandra’s ability to be distributed across multiple nodes for the new package. The data can thus be duplicated. While the package will keep one copy with the transactional servers, another copy would be subjected to analytics.
"The two parts of your data don’t interfere with each other," Pfeil said.
DataStax plans to release the new package, under Apache open-source license, in the next two months.