The open source cluster computing framework Apache Spark is now being actively used by 54% of people and the majority of them (64%) are finding that it’s proving invaluable.
That’s according to a Cloudera study, conducted by Taneja Group on 7,000 people from technical and managerial roles that are directly involved in big data.
According to the study the technology is being used for the most important use cases by 57% of people, when that technology is provided by Cloudera.
Those use cases aren’t always for the likes of data processing, engineering and ETL workloads that are said to make up 55% of current Spark use. The new workloads being seen on Spark include real-time stream processing, exploratory data science, and the emergence of Spark for machine learning.
Mike Matchett, senior analyst and consultant at Taneja Group, said: “We found that across the broad range of industries, company sizes, and big data maturity levels represented, over one-half of respondents are already actively using Apache Spark.
“It is proving invaluable as 64% of those currently using Spark plan to notably increase their usage within the next 12 months. With an increasing number of workloads requiring real-time data streaming for analytics, the emergence of machine learning applications and data science use cases, Spark is clearly here to stay.”
The technology is also frequently being aligned with the cloud, with current overall Apache Spark deployed in public/private cloud at around 23% today. This is expected to increase to 36% in the future.
Matchett said: “Interestingly, while on-premises Spark deployments dominate today there is a strong interest in transitioning many of those to cloud deployments going forward.”
Numerous companies in the Hadoop ecosystem such as Cloudera, and companies such as IBM have invested heavily in aligning analytics products around Apache Spark technology.
Cloudera for example was the first Hadoop vendor to ship and support it in 2014 and the company says that it has seen many of its users move data processing workloads from MapReduce to Spark in their production systems.