
Hadoop may have become almost synonymous with the world of big data it is neither the only technology nor the easiest to adopt.
While its ability to store and process vast amounts of data has made it a popular technology, particularly for its potential, there are challenges that must be overcome before adopting technologies in the Hadoop ecosystem.
CBR identifies the problem that you are likely to face and how to overcome them.
1. Hadoop confusion
Hadoop has picked up a little bit of a bad reputation for being extremely complex. While there are numerous companies such as Hortonworks, Cloudera and others working on making their own distributions easy to use, there remains complexity.
Selecting the right distribution can be a real challenge, especially as each of them embed different Hadoop components, for example Cloudera’s Impala in CDH, and configuration managers like Ambari.
Solving the challenge requires knowledge of the Hadoop ecosystem, the vendors and their different offerings. This can seem like an impossible task but by spending time reading through articles that compare the different distributions, perhaps speaking to consultants and by running a proof of concept, the right distribution to fit the business needs can be found.
2. Finding the use case
Before the business has even gone through the confusion of the Hadoop ecosystem there should be serious questions asked about why use the technology in the first place.
Should the business be devoting time and resources to Hadoop technologies if the problems they are trying to solve can be solved some other way entirely? Probably not.
This problem has been highlighted by the analyst firm Gartner, which in a 2015 study found that almost half of the 284 Gartner Research Circle members were finding it difficult to adopt the technology because of uncertainty around how it would provide business value.
To solve the problem businesses need to understand how much data they have, focus on the business problems they are being faced and consider whether Hadoop is the right technology. Having a strategy in place is vital and if the business is unsure about use cases then most vendors now have many examples to showcase on their sites.
3. Skills gap
The skills gap isn’t unique to Hadoop, it’s a problem that is across the technology sector but it has been magnified in the world of Hadoop.
The reason for this is as outlined above, it’s a complex technology and in the same research where Gartner highlighted the problem of finding a use case, the skills gap came out as the biggest hurdle to adoption.
Learning the skills for big data technology has been a challenge that the area needs to overcome. It is not possible for developers to simply download the technology and start working on it; it requires a minimum of four servers to work in the first place.
Overcoming this challenge isn’t an easy one and there isn’t really a quick fix. As mentioned earlier, there are vendors working on solving this problem but it takes time.
The vendors are busy running training programmes but there are jobs that businesses can do to help.
Firstly it is possible to train within the business, train the staff internally to be able to get use to the technology that they will be using. Secondly, find the right software. This is where knowing the use case and different vendor distributions can help.
4. Integration and management
This is an area that should have been covered when figuring out the business strategy with the technology. Basically it should have been figured out who will be maintaining it and is it going to replace existing systems such as the database or existing analytics tools.
Hadoop is typically used in conjunction with other existing technologies in the business but it is necessary to figure out what it will work alongside and what it won’t. Addressing this problem early on will help to save a lot of pain and suffering further down the road.
Like the other problems, vendors have been working on it. Most will now offer tools and instructions for how to integrate Hadoop and so will other vendors, for example in the database space. Database vendors have grown accustomed to integrating with Hadoop so some will offer native integration, meaning that it is even easier to integrate.
5. Data access
Finally, the work is almost done and the barriers to Hadoop have almost all been overcome but one more remains, which is transforming data into meaningful management information.
Hadoop is good at storing and processing data, it is at its most basic a batch-processing tool but it doesn’t necessarily offer a lot to the end user in terms of analytics.
Bringing in data from multiple data sources is relatively easy but it is not designed to be particularly interactive, meaning that there is both a skills gap issue and an issue of delivering value to the business.
Big moves have been made in this area with more and more vendors providing support for the technology in their own offerings. IBM for example plans to develop most of its analytics tools around Apache Spark, while SAP has also added the technology to its S4 HANA platform.
To some extent this is a problem that is out of the hands of the business, it is up to the vendors to make the technology accessible to every level of business and prove the value of it.
Most vendors have now got to the point where line of business users can use the technology as easily as they have other traditional BI solutions.