View all newsletters
Receive our newsletter - data, insights and analysis delivered to you
  1. Technology
  2. Data
February 17, 2014updated 22 Sep 2016 2:23pm

5 big companies managing big data on Hadoop

Use cases to inspire you to better your business by growing your data.

By Claire Vanner

As one of the world’s most popular free Java-based programming networks, Apache Hadoop is being used by an increasing number of companies who can no longer manage their data using traditional methods.

The open-source platform can deal with a wide variety of data, whether it is structured or unstructured, in its native format. This allows companies to put Big Data at the heart of their strategies to manage and grow their business.

Here are some cases of large enterprises that have used Hadoop to optimise their online capabilities:


Amazon (A9)

The A9 subsidiary of Amazon builds the website’s product search indices using the streaming API, which processes millions of sessions daily for analytics.

As the world’s largest e-commerce product search operation, A9’s search operations team have to work to manage big data using the C++, Pearl and Python tools.

Content from our partners
Rethinking cloud: challenging assumptions, learning lessons
DTX Manchester welcomes leading tech talent from across the region and beyond
The hidden complexities of deploying AI in your business

The A9 product search engine operates on clusters varying from 1 to 100 nodes to analyse data, observe traffic patterns and index every product in Amazon’s catalogue.




The world’s most popular social network handles big data in a big day: more than 1 billion pieces of content (we links, news stories, blog posts, photos etc) are shared each week.

Hadoop is used to store copies of internal log and dimension data sources and use it as a source for reporting/analytics and machine learning. The data runs across two major clusters: a 1100-machine cluster and a 300-machine cluster.

This operates across a Hadoop Hive warehouse to structure and store data using a SQP-like language called HiveQL.



Thanks to its comprehensive video streaming service with over 30 million monthly unique views, Hulu’s metric team receives a billion lines of events daily, known to Hulu as beacons.

Across a 13-machine cluster on Hbase hosting, Hulu operates beaconspec, a self-developed declarative language, for processing its log data. It uses beacons and baseline assertions about user behaviour to aggregate business intelligence.



As a data-driven company, Hadoop is an integral part of running Spotify. It is used for content generation, data aggregation, reporting and analysis.

Spotify wrote its own scheduler to optimise its chain jobs: Luigi, which runs over 7,500 daily Hadoop jobs over a 690 node cluster.

It also operates on the Yahoo!-developed PIG programme for small scripts, which is faster than streaming.

Hadoop has helped Spotify to leverage data to better its service, creating the radio recommendations and top list features.



Yahoo! is a big user of Hadoop and has even developed its own programmes to help its services function more efficiently. Self-developed Zookeeper coordinates distributed systems and parallel programming language PIG runs over 60% of its Hadoop jobs.

Its biggest cluster is a gargantuan 4,500 nodes to support research for Ad Systems and Web Search.

According to Yahoo!’s Hadoop blog, they have set a new Gray sort record (the sort rate achieved while sorting at least 100 terabytes of data). Yahoo!’s team nearly doubled the previous record of 0.725TB per minute to a sort rate of 1.42TB.

Websites in our network
Select and enter your corporate email address Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
  • CIO
  • CTO
  • CISO
  • CSO
  • CFO
  • CDO
  • CEO
  • Architect Founder
  • MD
  • Director
  • Manager
  • Other
Visit our privacy policy for more information about our services, how Progressive Media Investments may use, process and share your personal data, including information on your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.