The annual Hadoop Summit kicks off in Dublin this week with the big data market in as strong a position as it has ever been.
Hosted by Hortonworks and Yahoo, the event will feature speakers from the majority of the big players in the Hadoop ecosystem as well as a number of large companies that will talk about their deployments.
According to researchers from Wikibon, use of big data is expected to grow from $18.2bn in 2014 to $92.1bn by 2026, a 14.5% compound annual growth rate.
In a diverse and complex market, there is much to play for and fast growing Hadoop vendors such as Hortonworks, Cloudera, MapR and Teradata all want to be at the top of the pile and will likely use the event to display why they should be the vendor of choice for businesses.
Capitalising on hot topic technologies such as Spark, an open source processing engine that is built for speed, could help these vendors tap into the streaming analytics market which is expected to account for 16% of all big data spending by 2022, around $11.5bn.
Spark, which may eventually become the standard for Hadoop, is a technology that people should expect to see announcements around during the summit, particularly around the area of the Internet of Things, which is a use case that it is particularly applicable to.
In 2015 the Hadoop Summit chose to focus on collaboration and a number of announcements centred on both it and the Open Data Platform, now known as the ODPi, were made at the event.
Attendees and Hadoop watchers alike should probably expect to see further developments around the ODPi. Recently the initiative published its first runtime specification as it looks to standardise Hadoop applications. Further specifications may appear during the Summit, along with additional members joining – don’t expect it to be Cloudera though.
One of the frequently raised concerns in the world of Hadoop is the skills gap. Although this is a problem which also affects the broader tech community, it is an issue that has particularly impacted the sometimes difficult to understand Hadoop.
Keeping this in mind, onlookers should expect to see announcements being made about simplicity of use rather than ground-breaking advancements in capabilities. Look out for tweaks to existing tools and services that make them easier to use and easier to deploy.
During the Summit there will be numerous opportunities for developers to gain some training. Under the banner of ‘tracks’ there will be a host of industry experts, business leaders, architects, data scientists and Hadoop developers on hand to share use cases, best practices, cautionary tales and various insights into the technology.
Tracks at the Summit will include, Hadoop application development: dev languages, scripting, SQL and NoSQL. This track will give developers a chance to hear from the Hadoop community, and learn how they are building applications.
IoT, as mentioned earlier, will likely be one of the big topics at the event and there is a track dedicated to it called, Hadoop and the Internet of Things. The track will look at areas in IoT such as managing devices at the "jagged edge", strategies and practices for data ingestion and analysis, and best practices for deriving real-time actionable insights.
Customer stories are a given at events like these so expect to see a few of them. Businesses such as Truecar, Big Fish Games, LinkedIn, Spotify and Capital One will all have speakers at the event, as will Google.
Given that this event will be celebrating 10 years of Hadoop software, it is likely that there will be at least one major customer win. So for anyone on the lookout for case studies of how to use Hadoop in their business then this could be a good event to pay attention to.
A number of the big companies in attendance such as MapR, Hortonworks, HPE and Teradata have all recently come out with Hadoop technological advances and while it is likely there will be more, I wouldn’t expect major developments.
Cloudera has already been quite active with releases of tools and services and a more concerted cloud push has been talked about in recent months, this leads me to expect more talk about this during the conference.
While a lot of attention will be on the kinds of tools, services and customer wins that the vendors are showcasing, a lot will also be placed on strategic alliances and health of the Hadoop ecosystem.
The ODPi has already been covered but it wouldn’t be too surprising to see deeper partnerships appearing between vendors that are associated with it. HPE and Hortonworks are already collaborating around Apache Spark and it wouldn’t be too surprising to see the fruits of this labour being showcased.
On the health front there is plenty to talk about. The big data market is growing but it’s an extremely competitive place and not all of the companies involved are in the healthiest of positions.
Operating with debt is commonplace among many of the companies in Hadoop and a number of them, such as Hortonworks, have been hit by declining share prices.
There may be nothing overly concerning about this but investors still require reassurance that both the company and the ecosystem is in a good place.
Cloudera, a major rival to Hortonworks, is another company that has been hit by market turmoil. The company had its estimated value slashed by 38% at the end of March by Fidelity Investments as concerns around a long talked about Initial Public Offering. Questions will be raised and will need to be answered eventually about some of these issues.
It’s sometimes easy to forget that the Hadoop ecosystem isn’t just filled with start-ups, that in-fact it is populated with giants of the industry such as IBM and Microsoft as well.
IBM has placed a huge emphasis on Apache Spark across its analytics portfolio and according to the Wikibon big data market research it already holds the largest share of the market of any one vendor.
The Hadoop Summit will be another opportunity for the company to increase its profile in the space and reveal some developments around the area of Spark and its analytics portfolio.
With so much talk about Spark over the past year there is likely to remain a large focus on it, however, the Summit will have to be careful not to let it monopolise the conversation given how broad the Hadoop ecosystem is.
This article is from the CBROnline archive: some formatting and images may not be present.