Net neutrality. Data mining. Actionable Analytics. Machine Learning. Artificial Intelligence (AI). These are just some of the buzzwords you’ll find at discussed at tech events, featured on tech websites, and bandied about at meetings with IT teams today. Like all buzzwords, they encompass a whole range of ideas and technologies, features and glitches, hype and hope. In all of them, there is a lot of accurate information, as well as a lot of misconceptions.
Another buzzword that gets a lot of play – but isn’t well understood – is metadata. Everyone knows it’s important, everyone knows we need it. But how many know why – or even what – it is? As it turns out, metadata is important enough that understanding its role in your organization – and getting control of it – can improve efficiency, save you money, and help you gain a bigger picture of your organization’s needs and operations.
In addition, the new GDPR rules have made understanding metadata a requirement. Metadata is a key to ensuring that organizations can follow GDPR rules about privacy and being able to fulfill obligations regarding EU citizens’ “right to be forgotten.”
Just what is metadata? Metadata is the definition of the data – the label that is used to classify data in storage areas and databases – the label that is used to classify data in storage areas and databases. Metadata makes it possible to search for data by category, type, relationships, etc. Without metadata, the information in databases would basically be useless, as a manual search would be extremely time consuming and next to impossible.
As such, keeping metadata accurate and on point is essential for any organization. But in practice, metadata is often given short shrift. Over the years, an organization will build up large caches of data and implement new databases, new storage technologies, etc. And unless the organization is careful about its metadata policies, trouble could ensue. For example, An organization might record information about a customer’s location with a range of different labels, e.g. “location,” “address,” “city and state,” etc.
Whatever search system that is implemented needs to take into account these issues. As it turns out this is a chronic – and central – problem for many organizations, and one that by itself could seriously hamper their ability to even find data.
If an organization can’t get the names it uses for the same data straight, how can it hope to control it? Business Intelligence teams – who are usually in charge of tracking down data – will be unable to use basic search scripts to find data; if the categories they are searching for data in are not uniform – if metadata labels include “birthday,” “birth date,” date of birth,” “YY/MM/DD,” or a dozen other variants – how will they be able to search using scripting?
Another concern for organizations is data lineage – being able to trace the various iterations of data, especially when a problem crops up. For example, in the case of errors in annual reports, accounting ledgers, or financial records, organizations, as well as regulators, will need to see the history of data in order to determine where things went south. Metadata, which labels data by date or version, is an important concern here as well.
GDPR has focused a spotlight on the metadata issue, making resolving it crucial. GDPR requires organizations to track down personally identifiable information (PII) on any EU citizen who requests that it be deleted. That includes data in all the containers and formats it is stored in – databases, backups, social media posts, etc.
GDPR rules require that companies be able to demonstrate that they could fulfill their obligations on individuals’ “right to be forgotten.” Even if organizations are not asked to find specific pieces of data, they still need to be able to demonstrate to EU authorities that they are capable of doing so. Failure to demonstrate that could also net an organization penalties. If the metadata labels cannot be searched using standard scripts, it’s likely the BI team will miss something – and that could cost the company in hefty fines.
Data lineage is also a key factor in GDPR compliance. Here, too, being able to trace the movement and changes of data over time – where it came from, how it got into a specific repository – that will make it easier to track down the original source of the data, which may still be lurking within the system.
This inability to control data – which is often due to poor metadata management – is quite common. According to a study by NewVantage Partners, 85% of companies are trying to be data-driven, but only 37% of that number say they’ve been successful. Organizations have learned how to collect data, but not how to control it. The question is – what can be done about it?
There are several ways to handle it. BI teams can painstakingly go through the sources of data and check all the metadata labels, ensuring that they stick to the organization’s rules. For even a mid-sized organization, that obviously will require a lot of manpower, a lot of time – and a lot of money.
An alternative is to use search systems that are able to compensate for metadata issues. Such automated search systems are able to find data using multiple, or many, metadata labels, by examining not just the label, but the nature of the data itself. Thus, an automated intelligent search system will be able to figure out that all the iterations of “birthday” are the same – regardless of how the data is labeled.
For organizations, getting control of data in this manner is the best way to deal with metadata issues. Data, they say, is the new gold – and in order to mine it properly, organizations need to get on top of their metadata issues, and resolve them.