View all newsletters
Receive our newsletter - data, insights and analysis delivered to you
  1. What Is
March 20, 2017

What is Hadoop?

It may be synonymous with big data but that doesn't remove the complexity.

By James Nunns

Hadoop is a free framework that’s designed to support the processing of large data sets.

The Java-based programming framework is designed to support the processing of large data sets in a distributed computing environment that is typically built from commodity hardware.

The rise of big data into an essential component of business strategy to modernise and better serve customers has in part been boosted by the appearance of Hadoop.

Hadoop is part of the Apache Software Foundation, which supports the development of open-source software projects.

At its core, Hadoop consists of a storage part called the Hadoop Distributed File System (HDFS), and a processing part called MapReduce.

Basically, Hadoop works by splitting large files into blocks which are then distributed across nodes in a cluster to be processed.

The base framework is made up of Hadoop Common, which contains libraries and utilities for other Hadoop modules; HDFS, a distributed file system that stores data on commodity machines; YARN, which works as a resource management platform; and MapReduce, which is for large scale data processing.

Content from our partners
Green for go: Transforming trade in the UK
Manufacturers are switching to personalised customer experience amid fierce competition
How many ends in end-to-end service orchestration?

The MapReduce and HDFS components of Hadoop were originally inspired by Google papers on their MapReduce and Google File System, the paper was published in 2003.

Why is it called Hadoop?

Doug Cutting, the creator of Hadoop, named it after his son’s toy elephant. This is why many of the logos of Hadoop vendors, and of Hadoop itself, are elephants.

Java is the most common language on the Hadoop framework, although there is some native code in C and command line utilities written as shell scripts.

Hadoop can be used for low-cost storage and active data archiving, as a staging area for a data warehouse and analytics store, as a data lake, a sandbox for discovery and analysis, and for recommendation systems.

Companies such as Facebook, LinkedIn, Netflix and eBay are all users of Hadoop.

The Hadoop ecosystem is extremely large and is made up of companies such as Hortonworks, Cloudera, MapR, Teradata, and many more.

Some of the key software components include; Cassandra, a distributed database system; Spark, a cluster computing framework with in-memory analytics; and Oozie, a Hadoop job scheduler.

Websites in our network
Select and enter your corporate email address Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
  • CIO
  • CTO
  • CISO
  • CSO
  • CFO
  • CDO
  • CEO
  • Architect Founder
  • MD
  • Director
  • Manager
  • Other
Visit our privacy policy for more information about our services, how New Statesman Media Group may use, process and share your personal data, including information on your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.