Sign up for our newsletter
What Is

What is Hadoop?

Hadoop is a free framework that’s designed to support the processing of large data sets.

The Java-based programming framework is designed to support the processing of large data sets in a distributed computing environment that is typically built from commodity hardware.

The rise of big data into an essential component of business strategy to modernise and better serve customers has in part been boosted by the appearance of Hadoop.

Hadoop is part of the Apache Software Foundation, which supports the development of open-source software projects.

White papers from our partners

At its core, Hadoop consists of a storage part called the Hadoop Distributed File System (HDFS), and a processing part called MapReduce.

Basically, Hadoop works by splitting large files into blocks which are then distributed across nodes in a cluster to be processed.

The base framework is made up of Hadoop Common, which contains libraries and utilities for other Hadoop modules; HDFS, a distributed file system that stores data on commodity machines; YARN, which works as a resource management platform; and MapReduce, which is for large scale data processing.

The MapReduce and HDFS components of Hadoop were originally inspired by Google papers on their MapReduce and Google File System, the paper was published in 2003.

Why is it called Hadoop?

This article is from the CBROnline archive: some formatting and images may not be present.