Sign up for our newsletter
Technology / Software

Apache Hadoop: Crash or collaborate?

CBR is attending the Hadoop Summit, where many of the announcements have centred around collaboration, particularly on the Open Data Platform (ODP).

While this can be heralded as good news for the adoption of Apache Hadoop, with three big names joining the initiative, it must be remembered that this is a business decision that Pivotal, IBM and Hortonworks wouldn’t have taken if it didn’t benefit them.

The benefit to the industry in reducing compatibility issues and stress for testers may just be a happy by-product, so it is important to look at the thinking behind the decision to collaborate.

A key benefit of collaboration in an open-source community is that you will probably see your technology being used by a larger group. This comes as a result of interoperability, which should ensure that compatibility issues are highly reduced or totally removed from the equation.

White papers from our partners

Another of the benefits which is sung from the rooftops is the removal of vendor lock in. That is, unless you want to use a solution which is not a part of the ODP (Cloudera). In which case you had better have a product which is superior or you are likely to be left behind.

Time will tell if collaborative efforts will solve any issues which hinder Hadoop adoption, but for now it is the way of the market.

Andy Leaver, VP International Operations, Hortonworks, said that it is a question of whether you decide to collaborate or crash into each other: "Pretty much any company you go to will have made investments already…SAP, IBM, they’ll have components already."

"So the question is, do we collaborate and co-exist with them or crash into each other? We’ve made the decision to say, we are building Hadoop…and then everyone else can plug into that in some sort of way."

Perhaps then, one of the reasons for going open-source and to join a community like ODP is to reduce the advantage that a competitor can have.

"Look at the size of the portfolio of SAP, IBM, do I want to divert all this tech to go compete with an open source community of thousands of developers to develop my own distribution? No."

"Why would I do that? IBM was maybe a year out of date in terms of what we were doing with Hadoop and they said, why are we doing this? It’s not our core business, let’s standardise. And I am sure others will do that as well."

An issue that has affected Hadoop is the skills gap in the market, which has perhaps made the technology, implementation and upkeep more of a challenge than it should be.

"One of the biggest barriers to adoption is that there isn’t a lot of skills in the market, so we want to make it as easy to consume as possible," said Leaver.

It is now a matter of making tools more graphical and to simplify the process of management – steps which have been slow in coming. Leaver believes this is a classic case that fits in with the typical adoption curve.

"The community are technical people and technical people love command line interfaces. As soon as it starts to get into a major adoption, the rest of us scratch our heads and ask, what is this?"

"It is just a natural evolution and I am sure that the user interface will get richer and richer and simpler and simple…we can do more with less effectively."

This is coming as a result of the Hadoop adopters maturing from the early adoption phase, the demand is for a simpler user interface and Hadoop is responding.

The future for data will be looking at things such as in-memory and streaming, so that the community has lots of ways to consume and look at data.

What is certain is that the collaboration model is here to stay.

 
This article is from the CBROnline archive: some formatting and images may not be present.