Sign up for our newsletter
Technology / Data

Apache Spark tutorial

The popular open source big data processing framework Apache Spark has become one of the most talked about pieces of technology in recent years.

The popularity of the framework, which is designed around speed and ease of use, has seen the likes of IBM, Microsoft, and others align their own analytics portfolios around the technology.

Built on top of Hadoop MapReduce it extends this model in order to use more types of computations including, Interactive Queries and Stream Processing.

Spark can be deployed in three different ways, as a standalone deployment, on Hadoop Yarn, and Spark in MapReduce.
As a standalone deployment Spark sits on top of Hadoop Distributed File System so that space is allocated for HDFS. In this model Spark and HDFS run side by side to cover all Spark jobs on a cluster.

White papers from our partners

Running on Yarn means that Spark runs without any pre-installation or root access required, while Spark in MapReduce allows a user to start Spark and use its shell without any admin access.

Hadoop


This article is from the CBROnline archive: some formatting and images may not be present.