Apache Spark is an open source parallel processing framework which is designed to run data analytics across clustered computers.
The general engine for large-scale data processing is maintained by the Apache Software Foundation.
Spark is designed to provide programmers with an application programming interface (API) centred on a resilient distributed dataset, this is a multiset of data items distributed over a cluster of machines.
Spark is capable of running on Hadoop, Mesos, standalone, or in the cloud and can access numerous data sources such as HDFS, Cassandra, HBase, and S3.