LinkedIn has announced that it will be open sourcing its internal tool named WhereHows.

WhereHows has been developed to track changes in data that in order to reduce data redundancy. WhereHows works by creating a data repository and portal for processes, people and knowledge around data.

According to LinkedIn, so far, WhereHows has captured the status of 50,000 datasets, 14,000 comments, 35 million job executions and related lineage information. The data when combined is more than 15 petabytes.

As LinkedIn’s customers grew over a period of time, it stared facing issues with overall data flow and lineage across different processing frameworks, data platforms and scheduling systems.

This resulted in loss of productivity, difficulty in deriving data insights and data breakages and data redundancy.

So, LinkedIn came up with WhereHows to reduce this problem.

With WhereHows, LinkedIn could capture metadata across diverse systems and surface it through a single platform to simplify data and flow discovery problem.

After this process, WhereHows surfaces the data through two interfaces.

One is web application that enables navigation, search, lineage visualization, annotation, discussion and community participation.

The second is an API (Application Program Interface) endpoint where data processes and applications can be automated.

With this tool, it has been easier for LinkedIn to solve the problems of data and process lineage, data and process ownership, schema discovery and evolution history.

This could be achieved by integrating data from different types into a universal model. With a universal model, it was easy to leverage the value from metadata.

Now, LinkedIn wants to share this work with a broader data community and has open sourced this tool.

The WhereHows development kit has been placed in GitHub and a discussion group from LinkedIn has been created to share their knowledge and experience.

This group will also help in adding new features, finding bugs and fixing them.

Apart from this, LinkedIn said that it is also committed to transform its internal integrations into generic templates or plugins in open source, as much as possible.