Sign up for our newsletter
Technology / Data

Big, open hypocrites? Open source big data analytics brings out the Beauty and the Beast in tech

The virtues of open source have long been shouted from the rooftops by all those that invested in it, both financially and spiritually, and really, that’s what the DataWorks Summit in Munich has been pushing.

There’s a belief that the open source way of working can play a key role in fundamentally changing the way that the technology industry operates, and it’s one that seems to be resonating with vendors and customers alike.

“We firmly believe that open is the path forward, learning from each other’s mistakes & best practices,” said Terri Virnig, VP, Power Systems Ecosystem and Strategy at IBM, at the Summit keynote.

Although there’s plenty of openness bleeding into IBM, there’s still a largely proprietary side to it, for example the cognitive computing machine Watson.

White papers from our partners

Terri Virnig, VP Power Systems ecosystem and strategy, IBM.

On the one side there’s the likes of 100% open source Hortonworks discussing the virtues of the community and way of working, which is totally understandable, but on the other side there’s the IBM’s, the Dell EMC’s and many more that play up their open source efforts, while simultaneously keeping the ‘special secret sauce’ technology fully locked behind closed doors.

There’s a hypocrisy at open source events such as the DataWorks Summit that is unmissable, namely how large legacy tech vendors continue to have their cake and eat it too.

This is clearly a political and economic struggle that will rumble on, and this observation does not dismiss the large number of contributions that some of these large vendors do submit – that’s good, but so much more can be done.

One of the biggest areas that open source can have an impact is that of cyber security, and efforts are clearly underway to build better cyber security communities of the ‘good guys’ who can share data in order to better defend against cyber attacks.

Owen O’Malley, co-founder and technical fellow at Hortonworks said: “The bad guys are going to look at binaries regardless of if they have source code available. What open source does is let the good guys look at the source code as well.”

In essence, many eyes make it easier to spot bugs and to fix them.

Read more: Open source software is for everyone – so where are the women?

There’s already large communities of hackers working together to attack businesses and infrastructures around the world, so if the tech sector, governments and businesses do not have similar communities that share data and work together to counter the threat, there is no hope for mounting successful, long term defences.

Shaun Connolly, Chief Strategy Officer at Hortonworks spoke of the idea of collaboration or isolation and which one is the fastest path to innovation. Clearly the answer seems to be collaboration, given that the majority of innovation appears to be happening in this space.

For those that don’t know, DataWorks was formerly the Hadoop Summit and the name change is designed to reflect the changing face of Hadoop in light of disrupting factors such as the Internet of Things and Artificial Intelligence, to name just two.

Scott Gnau, CTO, Hortonworks.

Basically, the conference wants to represent the idea that it’s all about data and not just Hadoop. Scott Gnau, CTO, Hortonworks said that, “Data is our product…our product really is the data that we create, and more importantly, how do we move our business forward. Every business in some fashion is a data business.”

Gnau’s point is an accurate one that highlights the ever-growing importance of data within organisations. The correct use of data in a business can help to give them a better single view of a customer, influence product recommendations, improve inventory and supply chain management, lead to the optimisation of pricing, help to maximise yield, and so much more.

In Gnau’s mind, the best way to make the most of that data is through connected data platforms. He said: “We really need connected data platforms and not converged systems, it’s nearly impossible to get all data converged and have an impact in real-time.”

Clearly that mindset plays towards Hortonworks own strategy, but it is not the only way the data conundrum is being approached.

In the open source big data world there’s numerous different approaches, strategies, and technologies that can be consumed in order to tackle data analytics.

In the Hadoop ecosystem the Apache Software Foundation has become the core point where these are worked on and supported. As with anything, the ASF has its critics, but O’Malley believes that this model works.

O’Malley took time to look back to the first Hadoop Summit agenda and saw that technologies such as Hive, Pig, and Zookeeper were all present, along with a number of other projects being worked upon. The proof that ASF works, O’Malley said, is that “the ones that have been successful are the ones that have been inside Apache and not those that were outside.”

Dr Barry Devlin on the Beauty and the Beast impact of change.

The DataWorks event isn’t one that’s heavy on announcements, although Hortonworks announced the general availability of Hortonworks Data Platform version 2.6. In addition to it being available on IBM Power Systems, the latest version includes Spark 2.1 and the latest version of Zeppelin, while enhancements have also been made to Ranger and Atlas.

The latest release showcases a growing maturity of the product and matches with the growing fervour with which the open source methodology is being pitched.

Dr Barry Devlin spoke during the keynotes to talk, in addition to other things, about being on the threshold of change. Technology and society feels as though it is at the threshold and the open source is one as well.

Mass adoption of the open source ethos seems to be in touching distance, but as Dr Devlin said: “Everything changes, everything goes wild or wonderful at that moment. Sort of like beauty and the beast, this exponential curve holds both the good and the bad, holds the positives and negatives.”

Open source companies and communities will need to prepare themselves for this change because while good things can come from it, so can bad.
This article is from the CBROnline archive: some formatting and images may not be present.