View all newsletters
Receive our newsletter - data, insights and analysis delivered to you
  1. Technology
  2. Software
November 8, 2016updated 27 Jul 2022 9:31am

Machine learning and data science workloads ignite Apache Spark adoption

The use of Apache Spark is dramatically increasing as new workloads create more use cases.

By

The open source cluster computing framework Apache Spark is now being actively used by 54% of people and the majority of them (64%) are finding that it’s proving invaluable.

That’s according to a Cloudera study, conducted by Taneja Group on 7,000 people from technical and managerial roles that are directly involved in big data.

According to the study the technology is being used for the most important use cases by 57% of people, when that technology is provided by Cloudera.

Those use cases aren’t always for the likes of data processing, engineering and ETL workloads that are said to make up 55% of current Spark use. The new workloads being seen on Spark include real-time stream processing, exploratory data science, and the emergence of Spark for machine learning.

Mike Matchett, senior analyst and consultant at Taneja Group, said: “We found that across the broad range of industries, company sizes, and big data maturity levels represented, over one-half of respondents are already actively using Apache Spark.

“It is proving invaluable as 64% of those currently using Spark plan to notably increase their usage within the next 12 months. With an increasing number of workloads requiring real-time data streaming for analytics, the emergence of machine learning applications and data science use cases, Spark is clearly here to stay.”

cloudera-report

The technology is also frequently being aligned with the cloud, with current overall Apache Spark deployed in public/private cloud at around 23% today. This is expected to increase to 36% in the future.

Content from our partners
Why all businesses must democratise data analytics
Unlocking the value of artificial intelligence and machine learning
Behind the priorities of tech and cybersecurity leaders

Matchett said: “Interestingly, while on-premises Spark deployments dominate today there is a strong interest in transitioning many of those to cloud deployments going forward.”

Numerous companies in the Hadoop ecosystem such as Cloudera, and companies such as IBM have invested heavily in aligning analytics products around Apache Spark technology.

Also on CBR: What is Hadoop?

Cloudera for example was the first Hadoop vendor to ship and support it in 2014 and the company says that it has seen many of its users move data processing workloads from MapReduce to Spark in their production systems.

Websites in our network
NEWSLETTER Sign up Tick the boxes of the newsletters you would like to receive. Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
I consent to New Statesman Media Group collecting my details provided via this form in accordance with the Privacy Policy
SUBSCRIBED

THANK YOU