View all newsletters
Receive our newsletter - data, insights and analysis delivered to you
  1. Technology
  2. Data
March 4, 2016updated 05 Sep 2016 7:43am

LinkedIn open sources WhereHows metadata management tool

News: LinkedIn uses WhereHows to keep track of changes that have been occurring to data over time.

By CBR Staff Writer

LinkedIn has announced that it will be open sourcing its internal tool named WhereHows.

WhereHows has been developed to track changes in data that in order to reduce data redundancy. WhereHows works by creating a data repository and portal for processes, people and knowledge around data.

According to LinkedIn, so far, WhereHows has captured the status of 50,000 datasets, 14,000 comments, 35 million job executions and related lineage information. The data when combined is more than 15 petabytes.

As LinkedIn’s customers grew over a period of time, it stared facing issues with overall data flow and lineage across different processing frameworks, data platforms and scheduling systems.

This resulted in loss of productivity, difficulty in deriving data insights and data breakages and data redundancy.

So, LinkedIn came up with WhereHows to reduce this problem.

With WhereHows, LinkedIn could capture metadata across diverse systems and surface it through a single platform to simplify data and flow discovery problem.

Content from our partners
Unlocking growth through hybrid cloud: 5 key takeaways
How businesses can safeguard themselves on the cyber frontline
How hackers’ tactics are evolving in an increasingly complex landscape

After this process, WhereHows surfaces the data through two interfaces.

One is web application that enables navigation, search, lineage visualization, annotation, discussion and community participation.

The second is an API (Application Program Interface) endpoint where data processes and applications can be automated.

With this tool, it has been easier for LinkedIn to solve the problems of data and process lineage, data and process ownership, schema discovery and evolution history.

This could be achieved by integrating data from different types into a universal model. With a universal model, it was easy to leverage the value from metadata.

Now, LinkedIn wants to share this work with a broader data community and has open sourced this tool.

The WhereHows development kit has been placed in GitHub and a discussion group from LinkedIn has been created to share their knowledge and experience.

This group will also help in adding new features, finding bugs and fixing them.

Apart from this, LinkedIn said that it is also committed to transform its internal integrations into generic templates or plugins in open source, as much as possible.

Websites in our network
Select and enter your corporate email address Tech Monitor's research, insight and analysis examines the frontiers of digital transformation to help tech leaders navigate the future. Our Changelog newsletter delivers our best work to your inbox every week.
  • CIO
  • CTO
  • CISO
  • CSO
  • CFO
  • CDO
  • CEO
  • Architect Founder
  • MD
  • Director
  • Manager
  • Other
Visit our privacy policy for more information about our services, how New Statesman Media Group may use, process and share your personal data, including information on your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.