View all newsletters
Receive our newsletter - data, insights and analysis delivered to you
  1. Technology
  2. Software
November 8, 2016updated 27 Jul 2022 9:31am

Machine learning and data science workloads ignite Apache Spark adoption

The use of Apache Spark is dramatically increasing as new workloads create more use cases.

By James Nunns

The open source cluster computing framework Apache Spark is now being actively used by 54% of people and the majority of them (64%) are finding that it’s proving invaluable.

That’s according to a Cloudera study, conducted by Taneja Group on 7,000 people from technical and managerial roles that are directly involved in big data.

According to the study the technology is being used for the most important use cases by 57% of people, when that technology is provided by Cloudera.

Those use cases aren’t always for the likes of data processing, engineering and ETL workloads that are said to make up 55% of current Spark use. The new workloads being seen on Spark include real-time stream processing, exploratory data science, and the emergence of Spark for machine learning.

Mike Matchett, senior analyst and consultant at Taneja Group, said: “We found that across the broad range of industries, company sizes, and big data maturity levels represented, over one-half of respondents are already actively using Apache Spark.

“It is proving invaluable as 64% of those currently using Spark plan to notably increase their usage within the next 12 months. With an increasing number of workloads requiring real-time data streaming for analytics, the emergence of machine learning applications and data science use cases, Spark is clearly here to stay.”


Content from our partners
Scan and deliver
GenAI cybersecurity: "A super-human analyst, with a brain the size of a planet."
Cloud, AI, and cyber security – highlights from DTX Manchester

The technology is also frequently being aligned with the cloud, with current overall Apache Spark deployed in public/private cloud at around 23% today. This is expected to increase to 36% in the future.

Matchett said: “Interestingly, while on-premises Spark deployments dominate today there is a strong interest in transitioning many of those to cloud deployments going forward.”

Numerous companies in the Hadoop ecosystem such as Cloudera, and companies such as IBM have invested heavily in aligning analytics products around Apache Spark technology.

Also on CBR: What is Hadoop?

Cloudera for example was the first Hadoop vendor to ship and support it in 2014 and the company says that it has seen many of its users move data processing workloads from MapReduce to Spark in their production systems.

Websites in our network
Select and enter your corporate email address Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
  • CIO
  • CTO
  • CISO
  • CSO
  • CFO
  • CDO
  • CEO
  • Architect Founder
  • MD
  • Director
  • Manager
  • Other
Visit our privacy policy for more information about our services, how Progressive Media Investments may use, process and share your personal data, including information on your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.